Error: Could not find or load main class Dlog4j.configuration while executing airflow dag

3/5/2020

I have this EKS cluster, where I run my airflow dags. I want to add a log4j.properties file (different for different code base) while running spark job. It runs fine when I do spark-submit by logging into one of the container.

Command:

spark-submit 
 --conf spark.kubernetes.authenticate.driver.serviceAccountName=XXXX
 --files "/tmp/log4j.properties"
 --driver-java-options "-Dlog4j.debug=true -Dlog4j.configuration=file:/tmp/log4j.properties"
--conf "spark.executor.extraJavaOptions=-Dlog4j.debug=true -Dlog4j.configuration=file:/tmp/log4j.properties"
--conf spark.executor.instances=5
--conf spark.app.name=SparkPi
--conf spark.kubernetes.driver.container.image=XXXX:XX
--conf spark.kubernetes.executor.container.image=XXXX:XX
 local:///XXXX/XXXX.py

But when i run the same code through dags , i get this Error: Could not find or load main class Dlog4j.configuration error. Below is the configuration I mentioned for log4.

conf_dict = {
"Dlog4j.configuration": "file:/tmp/p8vl-customer-segmentation"
                            "/segmentation_de/log/log4j.properties",
"spark.driver.extraJavaOptions": "Dlog4j.configuration="
                                     "file:/tmp/p8vl-customer-segmentation/"
                                     "segmentation_de/log/log4j.properties"
}

conf_dict contains other configuration related to the cluster. Now I pass this conf_dict to SparkSubmitOperator

SparkSubmitOperator(
    task_id='XXXX',
    conn_id='spark_default',
    name='XXXX',
    py_files='{0}/code.zip,{0}/venv2.zip'.format(local_path),
    application='{}/XXXX.py'.format(local_path),
    files = "{}/log/log4j.properties".format(local_path),
    conf=conf_dict,
    verbose=True,
    dag=dag
)

But when execute the dags , I get above mentioned error.

-- vkSinha
airflow-operator
amazon-eks
apache-spark
kubernetes
log4j

0 Answers