How to include security context for running a spark-submit job on kubernetes

11/22/2021

I'm using Spark 2.4.5 to run a spark application on kubernetes through the spark-submit command. The application fails while trying to write outputs as detailed here, probably due to an issue with an incorrect security context. So I tried setting up a security context and running the application. I did this by creating a pod template as mentioned here, but I haven't been able to validate if the pod template has been set up properly (because I couldn't find proper examples), or if it's accessible from the driver and executor pods (since I couldn't find anything related to the template in the driver or kubernetes logs). This is the content of the pod template I used to set a security context.

apiVersion: v1
kind: Pod
metadata:
 name: spark-pod-template
spec:
  securityContext:
    runAsUser: 1000

This is the command I used.

 <SPARK_PATH>/bin/spark-submit --master k8s://https://dssparkcluster-dns-fa326f6a.hcp.southcentralus.azmk8s.io:443 \
 --deploy-mode cluster  --name spark-pi3 --conf spark.executor.instances=2 \
 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
 --conf spark.kubernetes.container.image=docker.io/datamechanics/spark:2.4.5-hadoop-3.1.0-java-8-scala-2.11-python-3.7-dm14 \
 --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.azure-fileshare-pvc.options.claimName=azure-fileshare-pvc \
 --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.azure-fileshare-pvc.mount.path=/opt/spark/work-dir \
 --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.azure-fileshare-pvc.options.claimName=azure-fileshare-pvc \
 --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.azure-fileshare-pvc.mount.path=/opt/spark/work-dir \
 --conf spark.kubernetes.driver.podTemplateFile=/opt/spark/work-dir/spark_pod_template.yml \
 --conf spark.kubernetes.executor.podTemplateFile=/opt/spark/work-dir/spark_pod_template.yml \
 --verbose /opt/spark/work-dir/wordcount2.py

I've placed the pod template file in a persistent volume mounted at /opt/spark/work-dir. The questions I have are:

  1. Is the pod template file accessible from the persistent volume?
  2. Are the file contents in the appropriate format for setting a runAsUser?
  3. Is the pod template functionality supported for Spark 2.4.5? Although it is mentioned in the 2.4.5 docs that security contexts can be implemented using pod templates, there is no pod template section as in the 3.2.0 docs.

Any help would be greatly appreciated. Thanks.

-- Maaverik
apache-spark
kubernetes
pyspark
security-context

0 Answers