spark docker-image-tool Cannot find docker image

4/8/2020

I deployed spark on kuberenets helm install microsoft/spark --version 1.0.0 (also tried bitnami chart with the same result)

then, as is described https://spark.apache.org/docs/latest/running-on-kubernetes.html#submitting-applications-to-kubernetes

i go to $SPARK_HOME/bin

docker-image-tool.sh -r -t my-tag build 

this returns Cannot find docker image. This script must be run from a runnable distribution of Apache Spark.

but all spark runnables are in this directory.

bash-4.4# cd $SPARK_HOME/bin
bash-4.4# ls
beeline               find-spark-home.cmd   pyspark.cmd           spark-class           spark-shell.cmd       spark-sql2.cmd        sparkR
beeline.cmd           load-spark-env.cmd    pyspark2.cmd          spark-class.cmd       spark-shell2.cmd      spark-submit          sparkR.cmd
docker-image-tool.sh  load-spark-env.sh     run-example           spark-class2.cmd      spark-sql             spark-submit.cmd      sparkR2.cmd
find-spark-home       pyspark               run-example.cmd       spark-shell           spark-sql.cmd         spark-submit2.cmd

any suggestions what am i doing wrong? i haven't made any other configurations with spark, am i missing something? should i install docker myself, or any other tools?

-- rigby
apache-spark
kubernetes

1 Answer

4/8/2020

You are mixing things here.

When you run helm install microsoft/spark --version 1.0.0 you're deploying Spark with all pre-requisites inside Kubernetes. Helm is doing all hard work for you. After you run this, Spark is ready to use.

Than after you deploy Spark using Helm you are trying to deploy Spark from inside a Spark pod that is already running on Kubernetes.

These are two different things that are not meant to be mixed. This guide is explaining how to run Spark on Kubernetes by hand but fortunately it can be done using Helm as you did before.

When you run helm install myspark microsoft/spark --version 1.0.0, the output is telling you how to access your spark webui:

NAME: myspark
LAST DEPLOYED: Wed Apr  8 08:01:39 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
1. Get the Spark URL to visit by running these commands in the same shell:

  NOTE: It may take a few minutes for the LoadBalancer IP to be available.
  You can watch the status of by running 'kubectl get svc --namespace default -w myspark-webui'

  export SPARK_SERVICE_IP=$(kubectl get svc --namespace default myspark-webui -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
  echo http://$SPARK_SERVICE_IP:8080

2. Get the Zeppelin URL to visit by running these commands in the same shell:

  NOTE: It may take a few minutes for the LoadBalancer IP to be available.
  You can watch the status of by running 'kubectl get svc --namespace default -w myspark-zeppelin'

  export ZEPPELIN_SERVICE_IP=$(kubectl get svc --namespace default myspark-zeppelin -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
  echo http://$ZEPPELIN_SERVICE_IP:8080

Let's check it:

$ export SPARK_SERVICE_IP=$(kubectl get svc --namespace default myspark-webui -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
$ echo http://$SPARK_SERVICE_IP:8080
http://34.70.212.182:8080

If you open this URL you have your Spark webui ready.

Spark WEBUI

-- mWatney
Source: StackOverflow