Pod template for specifying tolerations when running Spark on Kubernetes


I am having some issues when trying to launch Spark jobs via the Kubernetes scheduler.

I want all my driver/executor pods to be spawned onto nodes which has a certain taint. Because of this, I want to specify tolerations which will be directly injected into the pods configuration files. Currently, there is no default way directly from the spark-submit command

According to this and this, a user should be able to specify a pod template which can be set with the following parameters: spark.kubernetes.driver.podTemplateFile and spark.kubernetes.executor.podTemplateFile.

I tried specifying those parameters in the spark-submit command with the following file:


apiVersion: v1
kind: Pod
  - effect: NoSchedule
    key: dedicated
    operator: Equal
    value: test

However, this toleration never gets added to the launched driver pod. Is currently a way to solve this?

For reference, here is the full spark-submit command: /opt/spark/bin/spark-submit --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.persistent.options.claimName=pvc-storage --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.persistent.mount.subPath=test-stage1/spark --conf spark.executor.memory=1G --conf spark.executor.instances=1 --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.persistent.mount.subPath=test-stage1/spark --conf spark.kubernetes.executor.limit.cores=1 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.namespace=test-stage1 --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.persistent.mount.path=/persistent --conf spark.kubernetes.driver.limit.memory=3G --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.persistent.mount.path=/persistent --conf spark.submit.deployMode=cluster --conf spark.kubernetes.container.image=<SPARK IMAGE> --conf spark.master=k8s://https://kubernetes.default.svc --conf spark.kubernetes.driver.limit.cores=1 --conf spark.executor.cores=1 --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.persistent.options.claimName=pvc-storage --conf spark.kubernetes.container.image.pullPolicy=Always --conf spark.kubernetes.executor.podTemplateFile=//opt/pod_template.template --conf spark.kubernetes.driver.podTemplateFile=//opt/pod_template.template local:///opt/spark/examples/src/main/python/pi.py 100

-- toerq

2 Answers


You didn't specify which version of Spark is used? I don't think spark.kubernetes.driver.podTemplateFile and spark.kubernetes.executor.podTemplateFile is available until Spark 3.0.

Here is the Spark JIRA issue which added support for above 2 configuration options. It's only resolved for Spark 3.0 branch.

-- francoisf
I have checked various documentations and found few things that might be misconfigured here:

  1. Your pod_template.template should have the .yaml at the end
  2. You did not specify spark.kubernetes.driver.pod.name in your spark-submit command nor in the pod_template.template.yaml in a form of metadata
  3. You have used double // when specifing path for spark.kubernetes.driver.podTemplateFile= and spark.kubernetes.executor.podTemplateFile=
  4. You should put all your tolerations in "", for example: effect: "NoSchedule"

Please let me know if that helped.

-- OhHiMark
