We are migrating our Spark workloads from Cloudera to Kubernetes.
For demo purposes, we wish to run one of our spark jobs within a minikube cluster using spark-submit in cluster mode.
I would like to pass a typesafe application.conf
file as a java option to both driver and executor jvms using the spark.driver.defaultJavaOptions
and spark.executor.defaultJavaOptions
--conf
The configuration file has been copied to the spark docker image at build time at the /opt/spark/config
directory. The same docker image is used to run driver and executor pods.
application.conf is passed as follows:
--conf spark.driver.defaultJavaOptions="-Dconfig.file=file://${POD_CONFIG_DIR}/application.conf $JAVA_ARGS" \
--conf spark.executor.defaultJavaOptions="-Dconfig.file=file://${POD_CONFIG_DIR}/application.conf" \
where ${POD_CONFIG_DIR} is /opt/spark/config
My job doesn't work correctly, actually it takes the default values existing into reference.conf
file. I have no io FileNotFoundExceptions though. What could I be missing ? Thank you in advance.
Here is my full spark-submit command
spark-submit \
--master k8s://https://192.168.49.2:8443 \
--driver-memory ${SPARK_DRIVER_MEMORY} --executor-memory ${SPARK_EXECUTOR_MEMORY} \
--deploy-mode cluster \
--class "${MAIN_CLASS}" \
--conf spark.driver.defaultJavaOptions="-Dconfig.file=file://${POD_CONFIG_DIR}/application.conf $JAVA_ARGS" \
--conf spark.executor.defaultJavaOptions="-Dconfig.file=file://${POD_CONFIG_DIR}/application.conf" \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=$SPARK_CONTAINER_IMAGE \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kryoserializer.buffer.max=512M \
--conf spark.driver.maxResultSize=8192M \
--conf spark.kubernetes.authenticate.caCertFile=$HOME/.minikube/ca.crt \
--conf spark.executor.extraClassPath="./" \
local:///path/to/uber/jar/file.jar \
"${PROG_ARGS[@]}" > $LOG_FILE 2>&1