Spark job location in kubernetes cluster - No such file error

10/22/2018

I am trying to submit the Spark application to Kubernetes cluster, the job file is at : /opt/spark/applications/ path, submitting spark using below command:

${SPARK_PATH}/bin/spark-submit \
--master <K8S_MASTER> \
--deploy-mode cluster \
--name spark-py-driver \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=spark-py:2.4.0-rc3 \
--conf spark.kubernetes.driver.pod.name=spark-py-driver \
--conf spark.executor.memory=2g \
--conf spark.driver.memory=2g \
local:///opt/spark/applications/spark_submit_test_job.py

Am getting 'No such file or directory' error, though the job file is there at the path on the node.

python: can't open file '/opt/spark/applications/spark_submit_test_job.py': [Errno 2] No such file or directory
-- Lakshman Battini
apache-spark
kubernetes
python

1 Answer

10/22/2018

As the instruction on spark running on Kubernetes states:

Finally, notice that in the above example we specify a jar with a specific URI with a scheme of local://. This URI is the location of the example jar that is already in the Docker image.

You need to create a correct Dockerfile with the script already present in the image.

A better solution would be using a repository that can be loaded when you are running the image, or instead you can use Remote Dependencies

-- Crou
Source: StackOverflow