Spark/k8s: How do I install Spark 2.4 on an existing kubernetes cluster, in client mode?

3/11/2019

I want to install Apache Spark v2.4 on my Kubernetes cluster, but there does not seem to be a stable helm chart for this version. An older/stable chart (for v1.5.1) exists at

https://github.com/helm/charts/tree/master/stable/spark

How can I create/find a v2.4 chart?

Then: The reason for needing v2.4 is to enable client-mode, because I would like to be able to submit (PySpark/Jupyter notebook) jobs to the cluster from my laptop's dev environment. What extra steps are required to enable client-mode (including exposing the service)?

The closest attempt so far (but for Spark v2.0.0) that I have found, but which I haven't yet got working, is at

https://github.com/Uninett/kubernetes-apps/tree/master/spark

At https://github.com/phatak-dev/kubernetes-spark (also two years old), there is nothing about jupyter deployment.

Pangeo-specific: https://discourse.jupyter.org/t/spark-integration-documentation/243

SO thread: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/1030

I have searched for up-to-date resources on this but have found nothing that has everything in one place. I will update this question with other relevant links if and when people are able to point them out to me. Hopefully it will be possible to cobble together an answer.

As ever, huge thanks in advance.

Update:

https://github.com/SnappyDataInc/spark-on-k8s for v2.2 is extremely easy to deploy - looks promising...

-- jtlz2
apache-spark
jupyter
jupyterhub
kubernetes
pyspark

0 Answers