Zeppelin+Spark+Kubernetes: Let Zeppelin Job run on existing Spark Cluster

8/10/2020

In a k8s cluster. How do you configure zeppelin to run spark jobs in an existing spark cluster instead of spinning up a new pod?

I've got a k8s cluster up and running in which I want to run Spark with Zeppelin.

Spark is deployed using the official bitnami/spark helm chart (v 3.0.0). I got one Master and two Worker pods running fine, everything good.

Zeppelin is deployed with the zeppelin-server.yaml from the official apache-zeppelin github.

I've build my own zeppelin container without much modification from apache/zeppelin:0.9.0..

Short pseudo Dockerfile:

FROM bitnami/spark:3.0.0 AS spark 
FROM apache/zeppelin:0.9-0 AS Zeppelin 
COPY --from spark /opt/btinami/spark/ /opt/bitnami/spark  
RUN Install kubectl 
END

I modified zeppelin-server.yaml slightly. (Image, imagePullSecret, setting spark master to the headless Service DNS of spark master)

Now I want my zeppelin jobs to run on my existing spark cluster --- with no success.

When I'm submitting zeppelin jobs (for the spark interpreter), zeppelin fires up a new spark pod and solely works with this one. Spark interpreter settings are like they should be. spark master url is set (spark://\<master-url\>:\<master-port\>), spark home as well.

While this is kind of a sweet behaviour, it's not what I want.

What I want (and what my question is) is: I want my zeppelin pod to submit the spark jobs to the existing cluster - not fire up a new pod. I am PRETTY sure that there has to be some config/env/whatever that I have to set but I simply can't find it.

So, I wanna ask: Is there anyone out there, who knows how to run zeppelin spark jobs on an existing spark cluster? I thought setting the spark master should do the job...

Kind regards Bob

-- Rockbob
apache-spark
apache-zeppelin
kubernetes

1 Answer

10/19/2020

Answering myself after it has been a while...

For anyone running into the same problem:

1) Go into the spark interpreter Settings

2) (optional, if you haven't already got the property) Press "edit", scroll down and add the property SPARK_SUBMIT_OPTIONS

3) Edit SPARK_SUBMIT_OPTIONS value and add "--master spark://\<ENDPOINT OF YOUR SPARK MASTER> "

4) Save settings and done...

This threw me off massively, as there's already an option to set the spark master itself.

What solved the problem entering the spark master two times.

1) Under key "master" 2) The edit to SPARK_SUBMIT_OPTIONS described above.

-- Rockbob
Source: StackOverflow