Spark 2.3 - Minikube - Kubernetes - Windows - Demo - SparkPi not found

3/17/2018

I am trying to follow this but I am encountering an error.

In particular, when I run:

spark-submit.cmd --master k8s://https://192.168.1.40:8443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=spark:spark --conf spark.kubernetes.driver.pod.name=spark-pi-driver local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar

I get:

2018-03-17 02:09:00 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
         pod name: spark-pi-driver
         namespace: default
         labels: spark-app-selector -> spark-798e78e46c5c4a11870354b4b89602c0, spark-role -> driver
         pod uid: c6de9eb7-297f-11e8-b458-00155d735103
         creation time: 2018-03-17T01:09:00Z
         service account name: default
         volumes: default-token-m4k7h
         node name: minikube
         start time: 2018-03-17T01:09:00Z
         container images: spark:spark
         phase: Failed
         status: [ContainerStatus(containerID=docker://5c3a1c81333b9ee42a4e41ef5c83003cc110b37b4e0b064b0edffbfcd3d823b8, image=spark:spark, imageID=docker://sha256:92e664ebc1612a34d3b0cc7522615522805581ae10b60ebf8c144854f4207c06, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://5c3a1c81333b9ee42a4e41ef5c83003cc110b37b4e0b064b0edffbfcd3d823b8, exitCode=1, finishedAt=Time(time=2018-03-17T01:09:01Z, additionalProperties={}), message=null, reason=Error, signal=null, startedAt=Time(time=2018-03-17T01:09:01Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]

With kubectl logs -f spark-pi-driver telling me that:

C:\spark-2.3.0-bin-hadoop2.7>kubectl logs -f spark-pi-driver
++ id -u
+ myuid=0
++ id -g
+ mygid=0
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/ash
+ '[' -z root:x:0:0:root:/root:/bin/ash ']'
+ SPARK_K8S_CMD=driver
+ '[' -z driver ']'
+ shift 1
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_JAVA_OPTS
+ '[' -n '/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar;/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar' ']'
+ SPARK_CLASSPATH=':/opt/spark/jars/*:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar;/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
+ '[' -n '' ']'
+ case "$SPARK_K8S_CMD" in
+ CMD=(${JAVA_HOME}/bin/java "${SPARK_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS $SPARK_DRIVER_ARGS)
+ exec /sbin/tini -s -- /usr/lib/jvm/java-1.8-openjdk/bin/java -Dspark.executor.instances=1 -Dspark.driver.port=7078 -Dspark.driver.blockManager.port=7079 -Dspark.submit.deployMode=cluster -Dspark.jars=/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar,/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar -Dspark.app.id=spark-798e78e46c5c4a11870354b4b89602c0 -Dspark.kubernetes.container.image=spark:spark -Dspark.master=k8s://https://192.168.1.40:8443 -Dspark.kubernetes.executor.podNamePrefix=spark-pi-fb36460b4e853cc78f4f7ec4d9ec8d0a -Dspark.app.name=spark-pi -Dspark.driver.host=spark-pi-fb36460b4e853cc78f4f7ec4d9ec8d0a-driver-svc.default.svc -Dspark.kubernetes.driver.pod.name=spark-pi-driver -cp ':/opt/spark/jars/*:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar;/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar' -Xms1g -Xmx1g -Dspark.driver.bindAddress=172.17.0.4 org.apache.spark.examples.SparkPi
Error: Could not find or load main class org.apache.spark.examples.SparkPi

It cannot find the SparkPi class. Yet, when I explore the spark:spark container, the JAR is inside:

\opt\spark\examples\jars:
spark-examples_2.11-2.3.0.jar

So the image was built correctly...

Any ideas what's wrong?

Help!!!

Edit

I have been doing some more testing. I did set up an AKS in Azure and launched the same Docker image getting the same error. I was following this instructions but using the same Docker image as in local through ACR.

Also, the .JAR was uploaded to Blob Storage and an URL used for the case of AKS. Still I got the exact same error.

This somehow makes me think the error might be in the way I build the image itself or in the way I build the .JAR more so than in some configuration of the Cluster itself.

Yet, no cigar.

Any ideas - or even an URL to get a working Spark 2.3 image - would be welcome. I build the image in Windows. I will try to build it in Linux shortly, maybe that is the problem all along...

Thx

-- Ikos
apache-spark
kubernetes
windows

2 Answers

4/14/2018

I found a solution, I tried to run example from WSL(Linux Subsystem for WSL) and it works.

It's look like a bug in Spark 2.3 version, that can not setup paths correctly in Windows environment.Spark-on-k8s does not have behavior and works out of box. So I found workaround to schedule Spark 2.3:

  1. Setup WSL
  2. Setup Spark Environment for Linux Subsystem
  3. run in bash ./spark-submit --master k8s://https://192.168.1.40:8443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=spark:spark --conf spark.kubernetes.driver.pod.name=spark-pi-driver local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
-- Dmitry
Source: StackOverflow

6/8/2018

I know the topic is 3 months old but since I had similar issue and didn't found any valid answer, I'll post mine, maybe it'll help for the others:

As pointed here http://mail-archives.apache.org/mod_mbox/spark-user/201804.mbox/%3cCAAOnQ7v-oeWeW-VMtV5fuonjPau8vafzQPheypzjv+2M8aEp=Q@mail.gmail.com%3e, the problem may come from different classpath separator. To test, I ended up by modifying /kubernetes/dockerfiles/spark/Dockerfile from official Spark-Hadoop package. I added these 2 lines directly before ENV SPARK_HOME /opt/spark and my job could start:

COPY examples/jars/spark-examples_2.11-2.3.0.jar /opt/spark/jars
COPY examples/jars/scopt_2.11-3.7.0.jar /opt/spark/jars

It's a workaround instead of proper solution but at least it lets to make the tests.

And spark-submit command looks like:

./bin/spark-submit.cmd  --master k8s://localhost:6445  --deploy-mode cluster  --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=2  --conf spark.kubernetes.container.image=spark:latest --conf spark.app.name=spark-pi   local:///opt/spark/jars/spark-examples_2.11-2.3.0.jar

And I build Docker image like that: docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .

-- Bartosz Konieczny
Source: StackOverflow