I am having some issues when trying to launch Spark jobs via the Kubernetes scheduler.
I want all my driver/executor pods to be spawned onto nodes which has a certain taint. Because of this, I want to specify tolerations which will be directly injected into the pods configuration files. Currently, there is no default way directly from the spark-submit
command
According to this and this, a user should be able to specify a pod template which can be set with the following parameters: spark.kubernetes.driver.podTemplateFile
and spark.kubernetes.executor.podTemplateFile
.
I tried specifying those parameters in the spark-submit
command with the following file:
pod_template.template
apiVersion: v1
kind: Pod
spec:
tolerations:
- effect: NoSchedule
key: dedicated
operator: Equal
value: test
However, this toleration never gets added to the launched driver pod. Is currently a way to solve this?
For reference, here is the full spark-submit command: /opt/spark/bin/spark-submit --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.persistent.options.claimName=pvc-storage --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.persistent.mount.subPath=test-stage1/spark --conf spark.executor.memory=1G --conf spark.executor.instances=1 --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.persistent.mount.subPath=test-stage1/spark --conf spark.kubernetes.executor.limit.cores=1 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.namespace=test-stage1 --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.persistent.mount.path=/persistent --conf spark.kubernetes.driver.limit.memory=3G --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.persistent.mount.path=/persistent --conf spark.submit.deployMode=cluster --conf spark.kubernetes.container.image=<SPARK IMAGE> --conf spark.master=k8s://https://kubernetes.default.svc --conf spark.kubernetes.driver.limit.cores=1 --conf spark.executor.cores=1 --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.persistent.options.claimName=pvc-storage --conf spark.kubernetes.container.image.pullPolicy=Always --conf spark.kubernetes.executor.podTemplateFile=//opt/pod_template.template --conf spark.kubernetes.driver.podTemplateFile=//opt/pod_template.template local:///opt/spark/examples/src/main/python/pi.py 100
You didn't specify which version of Spark is used? I don't think spark.kubernetes.driver.podTemplateFile
and spark.kubernetes.executor.podTemplateFile
is available until Spark 3.0.
Here is the Spark JIRA issue which added support for above 2 configuration options. It's only resolved for Spark 3.0 branch.
I have checked various documentations and found few things that might be misconfigured here:
pod_template.template
should have the .yaml
at the endspark.kubernetes.driver.pod.name
in your spark-submit
command nor in the pod_template.template.yaml
in a form of metadata
//
when specifing path for spark.kubernetes.driver.podTemplateFile=
and spark.kubernetes.executor.podTemplateFile=
""
, for example: effect: "NoSchedule"
Please let me know if that helped.