spark-on-k8s resource staging server with Python

2/10/2018

I've been following the Running Spark on Kubernetes docs with spark-on-k8s v2.2.0-kubernetes-0.5.0, Kubernetes v1.9.0, and Minikube v0.25.0.

I am able to successfully run a Python job with this command:

bin/spark-submit \
  --deploy-mode cluster \
  --master k8s://https://10.128.0.4:8443 \
  --kubernetes-namespace default \
  --conf spark.executor.instances=1 \
  --conf spark.app.name=spark-pi \
  --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver-py:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor-py:v2.2.0-kubernetes-0.5.0 \
  --jars local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
  local:///opt/spark/examples/src/main/python/pi.py 10

I am able to successfully run a Java job with local dependencies (after setting up the the resource staging server) with this command:

bin/spark-submit \
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --master k8s://https://10.128.0.4:8443 \
  --kubernetes-namespace default \
  --conf spark.executor.instances=1 \
  --conf spark.app.name=spark-pi \
  --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.resourceStagingServer.uri=http://10.128.0.4:31000 \
  ./examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar

Is it possible to run Python jobs with local dependencies? I tried this command and it failed:

bin/spark-submit \
  --deploy-mode cluster \
  --master k8s://https://10.128.0.4:8443 \
  --kubernetes-namespace default \
  --conf spark.executor.instances=1 \
  --conf spark.app.name=spark-pi \
  --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver-py:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor-py:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.resourceStagingServer.uri=http://10.128.0.4:31000 \
  ./examples/src/main/python/pi.py 10

I get this error in the driver's logs:

Error: Could not find or load main class .opt.spark.jars.RoaringBitmap-0.5.11.jar

And these errors in the event logs:

MountVolume.SetUp failed for volume "spark-init-properties" : configmaps "spark-pi-1518224354203-init-config" not found
...
MountVolume.SetUp failed for volume "spark-init-secret" : secrets "spark-pi-1518224354203-init-secret" not found
-- David
apache-spark
kubernetes
pyspark
python

1 Answer

2/14/2018

The fix is to provide the examples jar as a dependency via --jars:

bin/spark-submit \
  --deploy-mode cluster \
  --master k8s://https://10.128.0.4:8443 \
  --kubernetes-namespace default \
  --conf spark.executor.instances=1 \
  --conf spark.app.name=spark-pi \
  --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver-py:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor-py:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.5.0 \
  --conf spark.kubernetes.resourceStagingServer.uri=http://10.128.0.4:31000 \
  --jars local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar \
  ./examples/src/main/python/pi.py 10

I'm not sure why this works (RoaringBitmap-0.5.11.jar should exist in /opt/spark/jars and be added to the classpath in any case), but this solves my issue for now.

-- David
Source: StackOverflow