I've deployed an spark cluster with one master and one worker.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
spark-master-0 1/1 Running 0 1d
spark-worker-0 1/1 Running 0 1d
Master node is reached behind this service:
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
spark-master-svc NodePort 172.30.152.169 <none> 7077:31095/TCP,80:30021/TCP 1d
I've also deployed zeppelin apache/zeppelin:0.9.0
:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
spark-master-0 1/1 Running 0 1d
spark-worker-0 1/1 Running 0 1d
zeppelin-54748cbd67-7qst8 1/1 Running 0 1h
I've edited spark interpreter in order to reach this spark cluster:
As you can see, I'm telling that spark is located on spark://spark-master-svc:7077
.
Nevertheless, when I'm trying to run a paragraph job, I've realized that a new pod is created.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
spark-master-0 1/1 Running 0 1d
spark-worker-0 1/1 Running 0 1d
zeppelin-54748cbd67-7qst8 1/1 Running 0 1h
spark-txljya 0/1 Init:ImagePullBackOff 0 1h
I thought a paragraph job was "submited" straightforwardly against my spark cluster. What's this pod for?
Another question: Why a SPARK_HOME
is needed? I mean, spark is running into separated cluster? It seems that spark has to be installed inside the zeppelin pod... It's confusing to me...
As you can see I'm a bit lost...
Any ideas?