I am trying to run Spark on kubernetes along with interactive commands run through Spark shell or jupyter interface. I had build custom images for both driver pod and executor pods and use below code to spin up Spark Context
import pyspark
conf = pyspark.SparkConf()
conf.setMaster("k8s://https://kubernetes.default.svc.cluster.local:443")
conf.set(
"spark.kubernetes.container.image",
"<Repo>/<IMAGENAME>:latest")
conf.set("spark.kubernetes.namespace": "default")
# Authentication certificate and token (required to create worker pods):
conf.set(
"spark.kubernetes.authenticate.caCertFile",
"/var/run/secrets/kubernetes.io/serviceaccount/ca.crt")
conf.set(
"spark.kubernetes.authenticate.oauthTokenFile",
"/var/run/secrets/kubernetes.io/serviceaccount/token")
conf.set(
"spark.kubernetes.authenticate.driver.serviceAccountName",
"spark-master")
conf.set("spark.executor.instances", "2")
conf.set(
"spark.driver.host", "spark-test-jupyter")
conf.set("spark.executor.memory", "1g")
conf.set("spark.executor.cores", "1")
conf.set("spark.driver.blockManager.port", "7777")
conf.set("spark.driver.bindAddress", "0.0.0.0")
conf.set("spark.driver.port", "29416")
sc = pyspark.SparkContext(conf=conf)
Driver tries to run executor pods, but it ends up in 2 executor pods trying to start but eventually erroring out and new set of pods doing the same. Logs below:
pyspark-shell-1620894878554-exec-8 0/1 Pending 0 0s
pyspark-shell-1620894878554-exec-8 0/1 ContainerCreating 0 0s
pyspark-shell-1620894878528-exec-7 1/1 Running 0 1s
pyspark-shell-1620894878554-exec-8 1/1 Running 0 2s
pyspark-shell-1620894878528-exec-7 0/1 Error 0 4s
pyspark-shell-1620894878554-exec-8 0/1 Error 0 4s
pyspark-shell-1620894878528-exec-7 0/1 Terminating 0 5s
pyspark-shell-1620894878528-exec-7 0/1 Terminating 0 5s
pyspark-shell-1620894878554-exec-8 0/1 Terminating 0 5s
pyspark-shell-1620894878554-exec-8 0/1 Terminating 0 5s
pyspark-shell-1620894883595-exec-9 0/1 Pending 0 0s
pyspark-shell-1620894883595-exec-9 0/1 Pending 0 0s
pyspark-shell-1620894883595-exec-9 0/1 ContainerCreating 0 0s
pyspark-shell-1620894883623-exec-10 0/1 Pending 0 0s
pyspark-shell-1620894883623-exec-10 0/1 Pending 0 0s
pyspark-shell-1620894883623-exec-10 0/1 ContainerCreating 0 0s
pyspark-shell-1620894883595-exec-9 1/1 Running 0 1s
pyspark-shell-1620894883623-exec-10 1/1 Running 0 3s
This goes on endlessly until stopped.
What could be going wrong here?
Your spark.driver.host
should be DNS of the service, so something like spark-test-jupyter.default.svc.cluster.local