Zeppelin and Spark in kubernetes


I've deployed an spark cluster with one master and one worker.

$ kubectl get pods
NAME                         READY   STATUS                  RESTARTS   AGE
spark-master-0               1/1     Running                 0          1d
spark-worker-0               1/1     Running                 0          1d

Master node is reached behind this service:

$ kubectl get services
NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                  AGE
spark-master-svc       NodePort   <none>        7077:31095/TCP,80:30021/TCP              1d

I've also deployed zeppelin apache/zeppelin:0.9.0:

$ kubectl get pods
NAME                         READY   STATUS                  RESTARTS   AGE
spark-master-0               1/1     Running                 0          1d
spark-worker-0               1/1     Running                 0          1d
zeppelin-54748cbd67-7qst8    1/1     Running                 0          1h

I've edited spark interpreter in order to reach this spark cluster:

As you can see, I'm telling that spark is located on spark://spark-master-svc:7077.

Nevertheless, when I'm trying to run a paragraph job, I've realized that a new pod is created.

$ kubectl get pods
NAME                         READY   STATUS                  RESTARTS   AGE
spark-master-0               1/1     Running                 0          1d
spark-worker-0               1/1     Running                 0          1d
zeppelin-54748cbd67-7qst8    1/1     Running                 0          1h
spark-txljya                 0/1     Init:ImagePullBackOff   0          1h

I thought a paragraph job was "submited" straightforwardly against my spark cluster. What's this pod for?

Another question: Why a SPARK_HOME is needed? I mean, spark is running into separated cluster? It seems that spark has to be installed inside the zeppelin pod... It's confusing to me...

As you can see I'm a bit lost...

Any ideas?

-- Jordi

