Spark on kubernetes: Executor pods not able to start and while creating sparkContext

5/13/2021

I am trying to run Spark on kubernetes along with interactive commands run through Spark shell or jupyter interface. I had build custom images for both driver pod and executor pods and use below code to spin up Spark Context

import pyspark
conf = pyspark.SparkConf()
conf.setMaster("k8s://https://kubernetes.default.svc.cluster.local:443")
conf.set(
    "spark.kubernetes.container.image", 
    "<Repo>/<IMAGENAME>:latest") 

conf.set("spark.kubernetes.namespace": "default")

# Authentication certificate and token (required to create worker pods):
conf.set(
    "spark.kubernetes.authenticate.caCertFile", 
    "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt")
conf.set(
    "spark.kubernetes.authenticate.oauthTokenFile", 
    "/var/run/secrets/kubernetes.io/serviceaccount/token")

conf.set(
    "spark.kubernetes.authenticate.driver.serviceAccountName", 
    "spark-master") 
conf.set("spark.executor.instances", "2") 
conf.set(
    "spark.driver.host", "spark-test-jupyter") 
conf.set("spark.executor.memory", "1g")
conf.set("spark.executor.cores", "1")
conf.set("spark.driver.blockManager.port", "7777")
conf.set("spark.driver.bindAddress", "0.0.0.0")

conf.set("spark.driver.port", "29416") 

sc = pyspark.SparkContext(conf=conf)

Driver tries to run executor pods, but it ends up in 2 executor pods trying to start but eventually erroring out and new set of pods doing the same. Logs below:

pyspark-shell-1620894878554-exec-8   0/1     Pending             0          0s
pyspark-shell-1620894878554-exec-8   0/1     ContainerCreating   0          0s
pyspark-shell-1620894878528-exec-7   1/1     Running             0          1s
pyspark-shell-1620894878554-exec-8   1/1     Running             0          2s
pyspark-shell-1620894878528-exec-7   0/1     Error               0          4s
pyspark-shell-1620894878554-exec-8   0/1     Error               0          4s
pyspark-shell-1620894878528-exec-7   0/1     Terminating         0          5s
pyspark-shell-1620894878528-exec-7   0/1     Terminating         0          5s
pyspark-shell-1620894878554-exec-8   0/1     Terminating         0          5s
pyspark-shell-1620894878554-exec-8   0/1     Terminating         0          5s
pyspark-shell-1620894883595-exec-9   0/1     Pending             0          0s
pyspark-shell-1620894883595-exec-9   0/1     Pending             0          0s
pyspark-shell-1620894883595-exec-9   0/1     ContainerCreating   0          0s
pyspark-shell-1620894883623-exec-10   0/1     Pending             0          0s
pyspark-shell-1620894883623-exec-10   0/1     Pending             0          0s
pyspark-shell-1620894883623-exec-10   0/1     ContainerCreating   0          0s
pyspark-shell-1620894883595-exec-9    1/1     Running             0          1s
pyspark-shell-1620894883623-exec-10   1/1     Running             0          3s

This goes on endlessly until stopped.

What could be going wrong here?

-- Avik Aggarwal
amazon-eks
apache-spark
docker
kubernetes
pyspark

1 Answer

5/21/2021

Your spark.driver.host should be DNS of the service, so something like spark-test-jupyter.default.svc.cluster.local

-- pltc
Source: StackOverflow