Spark executor pods quickly in error after their creation using kubernetes as master

7/23/2019

When I launch the SparkPi example on a self-hosted kubernetes cluster, the executor pods are quickly created -> have an error status -> are deleted -> are replaced by new executors pods.

I tried the same command on a Google Kubernetes Engine with success. I check the RBAC rolebinding to make sure that the service account has right to create the pod.

Guessing when the next executor pod will be ready, I can see using kubectl describe pod <predicted_executor_pod_with_number> that the pod is actually created:

Events:
  Type    Reason     Age   From                   Message
  ----    ------     ----  ----                   -------
  Normal  Scheduled  1s    default-scheduler      Successfully assigned default/examplepi-1563878435019-exec-145 to slave-node04
  Normal  Pulling    0s    kubelet, slave-node04  Pulling image "myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5"
  Normal  Pulled     0s    kubelet, slave-node04  Successfully pulled image "myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5"
  Normal  Created    0s    kubelet, slave-node04  Created container executor

This is my spark-submit call:

/opt/spark/bin/spark-submit \
    --master k8s://https://mycustomk8scluster:6443 \
    --name examplepi \
    --deploy-mode cluster \
    --driver-memory 2G \
    --executor-memory 2G \
    --conf spark.executor.instances=2 \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///opt/spark/work-dir/log4j.properties \
    --conf spark.kubernetes.container.image=myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5 \
    --conf spark.kubernetes.executor.container.image=myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5 \
    --conf spark.kubernetes.container.image.pullPolicy=Always \
    --conf spark.kubernetes.driver.pod.name=pi-driver \
    --conf spark.driver.allowMultipleContexts=true \
    --conf spark.kubernetes.local.dirs.tmpfs=true \
    --class com.olameter.sdi.imagery.IngestFromGrpc \
    --class org.apache.spark.examples.SparkPi \
    local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar 100

I expect that the required executor (2) should be created. If the driver script cannot create it, I would at least expect some log to be able to diagnose the issue.

-- Jean-Denis Giguère
apache-spark
kubernetes

1 Answer

8/3/2019

The issue was related to Hadoop + Spark integration. I was using Spark binary without Hadoop spark-2.4.3-bin-without-hadoop.tgz+ Hadoop 3.1.2. The configuration using environment variables seemed to be problematic for the Spark Executor.

I compiled Spark with Hadoop 3.1.2 to solve this issue. See: https://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn.

-- Jean-Denis Giguère
Source: StackOverflow