Change root path for Spark Web UI?

5/29/2019

I'm working to setup Jupyter notebook servers on Kubernetes that are able to launch pyspark. Each user is able to have a multiple servers running at once, and would access each by navigating to the appropriate host combined with a path to the server's fully-qualified name. For example: http://<hostname>/<username>/<notebook server name>.

I have a top-level function defined that allows a user create a SparkSession that points to the Kubernetes master URL and sets their pod to be the Spark driver.

This is all well and good, but I would like to enable end users to access the URL for the Spark Web UI so that they can track their jobs. The Spark on Kubernetes documentation has port forwarding as their recommended scheme for achieving this. It seems to be that for any security-minded organization, allowing any random user to setup port forwarding in this way would be unacceptable.

I would like to use an Ingress Kubernetes definition to allow external access to the driver's Spark Web UI. I've setup something like the following:

# Service
apiVersion: v1
kind: Service
metadata:
  namespace: <notebook namespae>
  name: <username>-<notebook server name>-svc
spec:
  type: ClusterIP
  sessionAffinity: None
  selector:
    app: <username>-<notebook server name>-notebook
  ports:
  - name: app-svc-port
    protocol: TCP
    port: 8888
    targetPort: 8888
  - name: spark-ui-port
    protocol: TCP
    port: 4040
    targetPort: 4040

# Ingress
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  namespace: workspace
  name: <username>-<notebook server name>-ing
  annotations:
    kubernetes.io/ingress.class: traefik
spec:
  rules:
  - host: <hostname>
    http:
      paths:
      - path: /<username>/<notebook server name>
        backend:
          serviceName: <username>-<notebook server name>-svc
          servicePort: app-svc-port
      - path: /<username>/<notebook server name>/spark-ui
        backend:
          serviceName: <username>-<notebook server name>-svc
          servicePort: spark-ui-port

However, under this setup, when I navigate to http://<hostname>/<username>/<notebook server name>/spark-ui/, I'm redirected to http://<hostname>/jobs. This is because /jobs is the default entry point to Spark's Web UI. However, I don't have an ingress rule for that path, and can't set such a rule since every user's Web UI would collide with each other in the load balancer (unless I have a misunderstanding, which is totally possible).

Under the Spark UI configuration settings, there doesn't seem to be a way to set a root path for the Spark session. You can change the port on which it runs, but what I'd like to do make the UI serve at something like: http://<hostname>/<username>/<notebook server name>/spark-ui/<jobs, stages, etc>. Is there really no way of changing what comes after the hostname of the URL and before the last part?

-- PMende
apache-spark
jupyter
kubernetes
pyspark
python

1 Answer

11/7/2019

Yes, you can achieve this. Specifically you can do this by setting the spark.ui.proxyBase property within spark-defaults.conf or at the run-time.

Example:

echo "spark.ui.proxyBase $SPARK_UI_PROXYBASE" >> /opt/spark/conf/spark-defaults.conf;

Then this should work.

-- theMJof91
Source: StackOverflow