SparkPi on kubernetes - Could not find or load main class?

7/22/2018

I'm trying to start a standard example SparkPi on a kubernetes cluster. Spark-submitt creates the pod and fails with error - "Error: Could not find or load main class org.apache.spark.examples.SparkPi".

spark-submit

spark-submit \
--master k8s://https://k8s-cluster:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.namespace=ca-app \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=gcr.io/cloud-solutions-images/spark:v2.3.0-gcs \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=default \
https://github.com/JWebDev/spark/blob/master/spark-examples_2.11-2.3.1.jar

Kubernetes creates 2 containers in pod. spark-init in which writes, that examples jar is copied.

2018-07-22 15:13:35 INFO  SparkPodInitContainer:54 - Downloading remote jars: Some(https://github.com/JWebDev/spark/blob/master/spark-examples_2.11-2.3.1.jar,https://github.com/JWebDev/spark/blob/master/spark-examples_2.11-2.3.1.jar)
2018-07-22 15:13:35 INFO  SparkPodInitContainer:54 - Downloading remote files: None
2018-07-22 15:13:37 INFO  Utils:54 - Fetching https://github.com/JWebDev/spark/blob/master/spark-examples_2.11-2.3.1.jar to /var/spark-data/spark-jars/fetchFileTemp6219129583337519707.tmp
2018-07-22 15:13:37 INFO  Utils:54 - Fetching https://github.com/JWebDev/spark/blob/master/spark-examples_2.11-2.3.1.jar to /var/spark-data/spark-jars/fetchFileTemp8698641635325948552.tmp
2018-07-22 15:13:37 INFO  SparkPodInitContainer:54 - Finished downloading application dependencies.

And spark-kubernetes-driver, throws me the error.

+ readarray -t SPARK_JAVA_OPTS
+ '[' -n /var/spark-data/spark-jars/spark-examples_2.11-2.3.1.jar:/var/spark-data/spark-jars/spark-examples_2.11-2.3.1.jar ']'
+ SPARK_CLASSPATH=':/opt/spark/jars/*:/var/spark-data/spark-jars/spark-examples_2.11-2.3.1.jar:/var/spark-data/spark-jars/spark-examples_2.11-2.3.1.jar'
+ '[' -n /var/spark-data/spark-files ']'
+ cp -R /var/spark-data/spark-files/. .
+ case "$SPARK_K8S_CMD" in
+ CMD=(${JAVA_HOME}/bin/java "${SPARK_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS $SPARK_DRIVER_ARGS)
+ exec /sbin/tini -s -- /usr/lib/jvm/java-1.8-openjdk/bin/java -Dspark.app.id=spark-e032bc91fc884e568b777f404bfbdeae -Dspark.kubernetes.container.image=gcr.io/cloud-solutions-images/spark:v2.3.0-gcs -Dspark.kubernetes.namespace=ca-app -Dspark.jars=https://github.com/JWebDev/spark/blob/master/spark-examples_2.11-2.3.1.jar,https://github.com/JWebDev/spark/blob/master/spark-examples_2.11-2.3.1.jar -Dspark.driver.host=spark-pi-11f2cd9133b33fc480a7b2f1d5c2fcc0-driver-svc.ca-app.svc -Dspark.master=k8s://https://k8s-cluster:6443 -Dspark.kubernetes.initContainer.configMapName=spark-pi-11f2cd9133b33fc480a7b2f1d5c2fcc0-init-config -Dspark.kubernetes.authenticate.driver.serviceAccountName=default -Dspark.driver.port=7078 -Dspark.kubernetes.driver.pod.name=spark-pi-11f2cd9133b33fc480a7b2f1d5c2fcc0-driver -Dspark.app.name=spark-pi -Dspark.kubernetes.executor.podNamePrefix=spark-pi-11f2cd9133b33fc480a7b2f1d5c2fcc0 -Dspark.driver.blockManager.port=7079 -Dspark.submit.deployMode=cluster -Dspark.executor.instances=5 -Dspark.kubernetes.initContainer.configMapKey=spark-init.properties -cp ':/opt/spark/jars/*:/var/spark-data/spark-jars/spark-examples_2.11-2.3.1.jar:/var/spark-data/spark-jars/spark-examples_2.11-2.3.1.jar' -Xms1g -Xmx1g -Dspark.driver.bindAddress=10.233.71.5 org.apache.spark.examples.SparkPi
Error: Could not find or load main class org.apache.spark.examples.SparkPi

What am I doing wrong? Thanks for the tips.

-- JDev
apache-spark
kubernetes
spark-submit

1 Answer

7/24/2018

I would suggest using https://github.com/JWebDev/spark/raw/master/spark-examples_2.11-2.3.1.jar since /blob/ is the HTML view of an asset, whereas /raw/ will 302-redirect to the actual storage URL for it

-- mdaniel
Source: StackOverflow