I have successfully created spark cluster on kubernetes with 1 master and 2 worker pods. The spark v2.4.3 running with Java 8 and scala 2.11.12 on k8s with kubectl v1.16.0 and minikube v1.4.0.
For detailed kubectl get pods
shows this -
NAME READY STATUS RESTARTS AGE
spark-master-fcfd55d7d-qrpsw 1/1 Running 0 66m
spark-worker-686bd57b5d-6s9zb 1/1 Running 0 65m
spark-worker-686bd57b5d-wrqrd 1/1 Running 0 65m
I am also able run built-in spark application such as pyspark
and spark-shell
by exec
ing the master pod -
kubectl exec spark-master-fcfd55d7d-qrpsw -it spark-shell`
Since I already have enough env I am trying to run my spark job on this like above. But it is not working. The spark submit command looks like this.
#!/usr/bin/env bash
spark-submit \
--class com.cloudian.spark.main.RequestInfoLogStreamer \
/Users/atekade/IdeaProjects/scala-spark-streaming/target/scala-2.11/scala-spark-streaming_2.11-1.0.jar
And the .sh
script is then submitted to master pod -
kubectl exec spark-master-fcfd55d7d-qrpsw /Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh
But this is giving me error -
OCI runtime exec failed: exec failed: container_linux.go:345: starting container process caused "exec: \"/Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh\": stat /Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh: no such file or directory": unknown
command terminated with exit code 126
What am I doing wrong here? My intention is to get the work done by these master and worker nodes.
As you can read from the error:
OCI runtime exec failed: exec failed: container_linux.go:345: starting container process caused "exec: \"/Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh\": stat /Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh: no such file or directory": unknown command terminated with exit code 126
What interest us the most is part /Users/atekade/IdeaProjects/scala-spark-streaming/logstreamer.sh: no such file or directory
, which means the pod is unable to locate the logstreamer.sh
file.
Script logstreamer.sh
needs to be uploaded to the spark-master
pod. Also the scala-spark-streaming_2.11-1.0.jar
needs to be there as well.
You can configure a PersistenVolume for Storage, this will be useful because if your pod will ever be rescheduled all data that was not stored on a PV will be lost.
Here is a link to minikube documentation for Persistent Volumes.
You can also use different Storage Classes.