spark-submit job by doing exec on a master pod in k8s

9/24/2019

I have successfully created spark cluster on kubernetes with 1 master and 2 worker pods. The spark v2.4.3 running with Java 8 and scala 2.11.12 on k8s with kubectl v1.16.0 and minikube v1.4.0.

For detailed kubectl get pods shows this -

NAME                            READY   STATUS    RESTARTS   AGE
spark-master-fcfd55d7d-qrpsw    1/1     Running   0          66m
spark-worker-686bd57b5d-6s9zb   1/1     Running   0          65m
spark-worker-686bd57b5d-wrqrd   1/1     Running   0          65m

I am also able run built-in spark application such as pyspark and spark-shell by execing the master pod -

kubectl exec spark-master-fcfd55d7d-qrpsw -it spark-shell` 

Since I already have enough env I am trying to run my spark job on this like above. But it is not working. The spark submit command looks like this.

#!/usr/bin/env bash

spark-submit \
   --class com.cloudian.spark.main.RequestInfoLogStreamer \
   /Users/atekade/IdeaProjects/scala-spark-streaming/target/scala-2.11/scala-spark-streaming_2.11-1.0.jar

And the .sh script is then submitted to master pod -

kubectl exec spark-master-fcfd55d7d-qrpsw /Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh

But this is giving me error -

OCI runtime exec failed: exec failed: container_linux.go:345: starting container process caused "exec: \"/Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh\": stat /Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh: no such file or directory": unknown
command terminated with exit code 126

What am I doing wrong here? My intention is to get the work done by these master and worker nodes.

-- Aniruddha Tekade
apache-spark
docker
exec
kubernetes
pyspark

1 Answer

9/25/2019

As you can read from the error:

OCI runtime exec failed: exec failed: container_linux.go:345: starting container process caused "exec: \"/Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh\": stat /Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh: no such file or directory": unknown command terminated with exit code 126

What interest us the most is part /Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh: no such file or directory, which means the pod is unable to locate the logstreamer.sh file.

Script logstreamer.sh needs to be uploaded to the spark-master pod. Also the scala-spark-streaming_2.11-1.0.jar needs to be there as well.

You can configure a PersistenVolume for Storage, this will be useful because if your pod will ever be rescheduled all data that was not stored on a PV will be lost.

Here is a link to minikube documentation for Persistent Volumes.

You can also use different Storage Classes.

-- Crou
Source: StackOverflow