Apache Spark remote cluster on JupyterHub notebooks on k8s

12/1/2020

I have :

Apache Spark : 2.4.4

JupyterHub : 1.1.0

Helm chart version : 0.9.0

K8S : 1.15

I build Jupyterhub on k8s with the official doc : https://zero-to-jupyterhub.readthedocs.io/

I use official Spark image to do some local jobs : jupyter/all-spark-notebook:latest

Spark works well in local mode.

But I want to use JupyterHub notebook to do some jobs on remote (homemade) Apache Spark cluster (with K8s as orchestrator).

I already tried Apache Zeppelin, it's works well ! but I want to do the same thing with Jupyterhub.

How can I do this ?

-- sacha.p
apache-spark
jupyter-notebook
jupyterhub
kubernetes

1 Answer

12/6/2020

I understand your pain. I burn a lot of time to create spark cluster + jupyter server work.

Try use my docker-compose.yaml.

docker-compose up -d

For get jupyter token run:

docker-compose logs jupyter

Copy url starting 127.0.0.1 include token and put into your browser. Change port to 7777.

You will see empty jupyter page. Create new notebook and run cell as you see on picture new notebook with spark shell

Enjoy using jupyter with spark...

Hope it's help you.

-- ozlevka
Source: StackOverflow