I deployed spark on kuberenets helm install microsoft/spark --version 1.0.0
(also tried bitnami chart with the same result)
then, as is described https://spark.apache.org/docs/latest/running-on-kubernetes.html#submitting-applications-to-kubernetes
i go to $SPARK_HOME/bin
docker-image-tool.sh -r -t my-tag build
this returns Cannot find docker image. This script must be run from a runnable distribution of Apache Spark.
but all spark runnables are in this directory.
bash-4.4# cd $SPARK_HOME/bin
bash-4.4# ls
beeline find-spark-home.cmd pyspark.cmd spark-class spark-shell.cmd spark-sql2.cmd sparkR
beeline.cmd load-spark-env.cmd pyspark2.cmd spark-class.cmd spark-shell2.cmd spark-submit sparkR.cmd
docker-image-tool.sh load-spark-env.sh run-example spark-class2.cmd spark-sql spark-submit.cmd sparkR2.cmd
find-spark-home pyspark run-example.cmd spark-shell spark-sql.cmd spark-submit2.cmd
any suggestions what am i doing wrong? i haven't made any other configurations with spark, am i missing something? should i install docker myself, or any other tools?
You are mixing things here.
When you run helm install microsoft/spark --version 1.0.0
you're deploying Spark with all pre-requisites inside Kubernetes. Helm is doing all hard work for you. After you run this, Spark is ready to use.
Than after you deploy Spark using Helm you are trying to deploy Spark from inside a Spark pod that is already running on Kubernetes.
These are two different things that are not meant to be mixed. This guide is explaining how to run Spark on Kubernetes by hand but fortunately it can be done using Helm as you did before.
When you run helm install myspark microsoft/spark --version 1.0.0
, the output is telling you how to access your spark webui:
NAME: myspark
LAST DEPLOYED: Wed Apr 8 08:01:39 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
1. Get the Spark URL to visit by running these commands in the same shell:
NOTE: It may take a few minutes for the LoadBalancer IP to be available.
You can watch the status of by running 'kubectl get svc --namespace default -w myspark-webui'
export SPARK_SERVICE_IP=$(kubectl get svc --namespace default myspark-webui -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo http://$SPARK_SERVICE_IP:8080
2. Get the Zeppelin URL to visit by running these commands in the same shell:
NOTE: It may take a few minutes for the LoadBalancer IP to be available.
You can watch the status of by running 'kubectl get svc --namespace default -w myspark-zeppelin'
export ZEPPELIN_SERVICE_IP=$(kubectl get svc --namespace default myspark-zeppelin -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo http://$ZEPPELIN_SERVICE_IP:8080
Let's check it:
$ export SPARK_SERVICE_IP=$(kubectl get svc --namespace default myspark-webui -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
$ echo http://$SPARK_SERVICE_IP:8080
http://34.70.212.182:8080
If you open this URL you have your Spark webui ready.