I've been following the Running Spark on Kubernetes docs with spark-on-k8s v2.2.0-kubernetes-0.5.0, Kubernetes v1.9.0, and Minikube v0.25.0.
I am able to successfully run a Python job with this command:
bin/spark-submit \
--deploy-mode cluster \
--master k8s://https://10.128.0.4:8443 \
--kubernetes-namespace default \
--conf spark.executor.instances=1 \
--conf spark.app.name=spark-pi \
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver-py:v2.2.0-kubernetes-0.5.0 \
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor-py:v2.2.0-kubernetes-0.5.0 \
--jars local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
local:///opt/spark/examples/src/main/python/pi.py 10
I am able to successfully run a Java job with local dependencies (after setting up the the resource staging server) with this command:
bin/spark-submit \
--deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
--master k8s://https://10.128.0.4:8443 \
--kubernetes-namespace default \
--conf spark.executor.instances=1 \
--conf spark.app.name=spark-pi \
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.5.0 \
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.5.0 \
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.5.0 \
--conf spark.kubernetes.resourceStagingServer.uri=http://10.128.0.4:31000 \
./examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar
Is it possible to run Python jobs with local dependencies? I tried this command and it failed:
bin/spark-submit \
--deploy-mode cluster \
--master k8s://https://10.128.0.4:8443 \
--kubernetes-namespace default \
--conf spark.executor.instances=1 \
--conf spark.app.name=spark-pi \
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver-py:v2.2.0-kubernetes-0.5.0 \
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor-py:v2.2.0-kubernetes-0.5.0 \
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.5.0 \
--conf spark.kubernetes.resourceStagingServer.uri=http://10.128.0.4:31000 \
./examples/src/main/python/pi.py 10
I get this error in the driver's logs:
Error: Could not find or load main class .opt.spark.jars.RoaringBitmap-0.5.11.jar
And these errors in the event logs:
MountVolume.SetUp failed for volume "spark-init-properties" : configmaps "spark-pi-1518224354203-init-config" not found
...
MountVolume.SetUp failed for volume "spark-init-secret" : secrets "spark-pi-1518224354203-init-secret" not found
The fix is to provide the examples jar as a dependency via --jars
:
bin/spark-submit \
--deploy-mode cluster \
--master k8s://https://10.128.0.4:8443 \
--kubernetes-namespace default \
--conf spark.executor.instances=1 \
--conf spark.app.name=spark-pi \
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver-py:v2.2.0-kubernetes-0.5.0 \
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor-py:v2.2.0-kubernetes-0.5.0 \
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.5.0 \
--conf spark.kubernetes.resourceStagingServer.uri=http://10.128.0.4:31000 \
--jars local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
./examples/src/main/python/pi.py 10
I'm not sure why this works (RoaringBitmap-0.5.11.jar
should exist in /opt/spark/jars
and be added to the classpath in any case), but this solves my issue for now.