I have this EKS cluster, where I run my airflow dags. I want to add a log4j.properties file (different for different code base) while running spark job. It runs fine when I do spark-submit by logging into one of the container.
Command:
spark-submit
--conf spark.kubernetes.authenticate.driver.serviceAccountName=XXXX
--files "/tmp/log4j.properties"
--driver-java-options "-Dlog4j.debug=true -Dlog4j.configuration=file:/tmp/log4j.properties"
--conf "spark.executor.extraJavaOptions=-Dlog4j.debug=true -Dlog4j.configuration=file:/tmp/log4j.properties"
--conf spark.executor.instances=5
--conf spark.app.name=SparkPi
--conf spark.kubernetes.driver.container.image=XXXX:XX
--conf spark.kubernetes.executor.container.image=XXXX:XX
local:///XXXX/XXXX.py
But when i run the same code through dags , i get this Error: Could not find or load main class Dlog4j.configuration
error. Below is the configuration I mentioned for log4.
conf_dict = {
"Dlog4j.configuration": "file:/tmp/p8vl-customer-segmentation"
"/segmentation_de/log/log4j.properties",
"spark.driver.extraJavaOptions": "Dlog4j.configuration="
"file:/tmp/p8vl-customer-segmentation/"
"segmentation_de/log/log4j.properties"
}
conf_dict contains other configuration related to the cluster. Now I pass this conf_dict to SparkSubmitOperator
SparkSubmitOperator(
task_id='XXXX',
conn_id='spark_default',
name='XXXX',
py_files='{0}/code.zip,{0}/venv2.zip'.format(local_path),
application='{}/XXXX.py'.format(local_path),
files = "{}/log/log4j.properties".format(local_path),
conf=conf_dict,
verbose=True,
dag=dag
)
But when execute the dags , I get above mentioned error.