I'm working to setup Jupyter notebook servers on Kubernetes that are able to launch pyspark
. Each user is able to have a multiple servers running at once, and would access each by navigating to the appropriate host combined with a path to the server's fully-qualified name. For example: http://<hostname>/<username>/<notebook server name>
.
I have a top-level function defined that allows a user create a SparkSession
that points to the Kubernetes master URL and sets their pod to be the Spark driver.
This is all well and good, but I would like to enable end users to access the URL for the Spark Web UI so that they can track their jobs. The Spark on Kubernetes documentation has port forwarding as their recommended scheme for achieving this. It seems to be that for any security-minded organization, allowing any random user to setup port forwarding in this way would be unacceptable.
I would like to use an Ingress
Kubernetes definition to allow external access to the driver's Spark Web UI. I've setup something like the following:
# Service
apiVersion: v1
kind: Service
metadata:
namespace: <notebook namespae>
name: <username>-<notebook server name>-svc
spec:
type: ClusterIP
sessionAffinity: None
selector:
app: <username>-<notebook server name>-notebook
ports:
- name: app-svc-port
protocol: TCP
port: 8888
targetPort: 8888
- name: spark-ui-port
protocol: TCP
port: 4040
targetPort: 4040
# Ingress
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
namespace: workspace
name: <username>-<notebook server name>-ing
annotations:
kubernetes.io/ingress.class: traefik
spec:
rules:
- host: <hostname>
http:
paths:
- path: /<username>/<notebook server name>
backend:
serviceName: <username>-<notebook server name>-svc
servicePort: app-svc-port
- path: /<username>/<notebook server name>/spark-ui
backend:
serviceName: <username>-<notebook server name>-svc
servicePort: spark-ui-port
However, under this setup, when I navigate to http://<hostname>/<username>/<notebook server name>/spark-ui/
, I'm redirected to http://<hostname>/jobs
. This is because /jobs
is the default entry point to Spark's Web UI. However, I don't have an ingress rule for that path, and can't set such a rule since every user's Web UI would collide with each other in the load balancer (unless I have a misunderstanding, which is totally possible).
Under the Spark UI configuration settings, there doesn't seem to be a way to set a root path for the Spark session. You can change the port on which it runs, but what I'd like to do make the UI serve at something like: http://<hostname>/<username>/<notebook server name>/spark-ui/<jobs, stages, etc>
. Is there really no way of changing what comes after the hostname of the URL and before the last part?
Yes, you can achieve this. Specifically you can do this by setting the spark.ui.proxyBase property within spark-defaults.conf or at the run-time.
Example:
echo "spark.ui.proxyBase $SPARK_UI_PROXYBASE" >> /opt/spark/conf/spark-defaults.conf;
Then this should work.