Cannot launch SparkPi example on Kubernetes Spark 2.4.0

1/16/2019

I've been trying to simply run the SparkPi example on Kubernetes with Spark 2.4.0 and it doesn't seem to behave at all like in the documentation.

I followed the guide. I built a vanilla docker image with the docker-image-tool.sh script. Added it to my registry.

I launch the job from my spark folder with a command like this:

bin/spark-submit \
    --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.container.image=<spark-image> \
    --conf spark.kubernetes.namespace=mynamespace \
    --conf spark.kubernetes.container.image.pullSecrets=myPullSecret \
    local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar

This is virtually the same as in the documentation except for the namespace and pullSecrets options. I need these options because of constraints in a multi user kubernetes environment. Even so I tried using the default namespace and I got the same outcome.

What happens is that the pod gets stuck in the failed state and two abnormal conditions occur:

  • There's an error: MountVolume.SetUp failed for volume "spark-conf-volume" : configmaps "spark-pi-1547643379283-driver-conf-map" not found. Indicating that k8s could not mount the config map to /opt/spark/conf which should contain a properties file. The config map (with the same name) exists so I don't understand why k8s cannot mount it.
  • In the container logs there are several essential environment variables in the launch command that are empty.

container log:

CMD=(${JAVA_HOME}/bin/java "${SPARK_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS $SPARK_DRIVER_ARGS)
exec /sbin/tini -s -- /usr/lib/jvm/java-1.8-openjdk/bin/java -cp ':/opt/spark/jars/*' -Xms -Xmx -Dspark.driver.bindAddress=10.11.12.13

You can control some of these variables directly with properties such as spark.kubernetes.driverEnv.SPARK_DRIVER_CLASS but this should not be necessary as (in this example the class is already specified with --class).

For clarity the following environment variables are empty:

  • SPARK_DRIVER_MEMORY
  • SPARK_DRIVER_CLASS
  • SPARK_DRIVER_ARGS

The SPARK_CLASSPATH is also missing the container-local jar I specified on the command line (spark-examples_2.11-2.4.0.jar).

It seems that even if we resolve the problem with mounting the configmap it won't help populate SPARK_DRIVER_MEMORY because it does not contain an equivalent configuration parameter.

How do I resolve the problem of mounting the config map and how do I resolve these environment variables?

The kubernetes yaml configuration is created by Spark, but in case it help I am posting here:

pod-spec.yaml

    {
      "kind": "Pod",
      "apiVersion": "v1",
      "metadata": {
        "name": "spark-pi-1547644451461-driver",
        "namespace": "frank",
        "selfLink": "/api/v1/namespaces/frank/pods/spark-pi-1547644451461-driver",
        "uid": "90c9577c-1990-11e9-8237-00155df6cf35",
        "resourceVersion": "19241392",
        "creationTimestamp": "2019-01-16T13:13:50Z",
        "labels": {
          "spark-app-selector": "spark-6eafcf5825e94637974f39e5b8512028",
          "spark-role": "driver"
        }
      },
      "spec": {
        "volumes": [
          {
            "name": "spark-local-dir-1",
            "emptyDir": {}
          },
          {
            "name": "spark-conf-volume",
            "configMap": {
              "name": "spark-pi-1547644451461-driver-conf-map",
              "defaultMode": 420
            }
          },
          {
            "name": "default-token-rfz9m",
            "secret": {
              "secretName": "default-token-rfz9m",
              "defaultMode": 420
            }
          }
        ],
        "containers": [
          {
            "name": "spark-kubernetes-driver",
            "image": "my-repo:10001/spark:latest",
            "args": [
              "driver",
              "--properties-file",
              "/opt/spark/conf/spark.properties",
              "--class",
              "org.apache.spark.examples.SparkPi",
              "spark-internal"
            ],
            "ports": [
              {
                "name": "driver-rpc-port",
                "containerPort": 7078,
                "protocol": "TCP"
              },
              {
                "name": "blockmanager",
                "containerPort": 7079,
                "protocol": "TCP"
              },
              {
                "name": "spark-ui",
                "containerPort": 4040,
                "protocol": "TCP"
              }
            ],
            "env": [
              {
                "name": "SPARK_DRIVER_BIND_ADDRESS",
                "valueFrom": {
                  "fieldRef": {
                    "apiVersion": "v1",
                    "fieldPath": "status.podIP"
                  }
                }
              },
              {
                "name": "SPARK_LOCAL_DIRS",
                "value": "/var/data/spark-368106fd-09e1-46c5-a443-eec0b64b5cd9"
              },
              {
                "name": "SPARK_CONF_DIR",
                "value": "/opt/spark/conf"
              }
            ],
            "resources": {
              "limits": {
                "memory": "1408Mi"
              },
              "requests": {
                "cpu": "1",
                "memory": "1408Mi"
              }
            },
            "volumeMounts": [
              {
                "name": "spark-local-dir-1",
                "mountPath": "/var/data/spark-368106fd-09e1-46c5-a443-eec0b64b5cd9"
              },
              {
                "name": "spark-conf-volume",
                "mountPath": "/opt/spark/conf"
              },
              {
                "name": "default-token-rfz9m",
                "readOnly": true,
                "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"
              }
            ],
            "terminationMessagePath": "/dev/termination-log",
            "terminationMessagePolicy": "File",
            "imagePullPolicy": "IfNotPresent"
          }
        ],
        "restartPolicy": "Never",
        "terminationGracePeriodSeconds": 30,
        "dnsPolicy": "ClusterFirst",
        "serviceAccountName": "default",
        "serviceAccount": "default",
        "nodeName": "kube-worker16",
        "securityContext": {},
        "imagePullSecrets": [
          {
            "name": "mypullsecret"
          }
        ],
        "schedulerName": "default-scheduler",
        "tolerations": [
          {
            "key": "node.kubernetes.io/not-ready",
            "operator": "Exists",
            "effect": "NoExecute",
            "tolerationSeconds": 300
          },
          {
            "key": "node.kubernetes.io/unreachable",
            "operator": "Exists",
            "effect": "NoExecute",
            "tolerationSeconds": 300
          }
        ]
      },
      "status": {
        "phase": "Failed",
        "conditions": [
          {
            "type": "Initialized",
            "status": "True",
            "lastProbeTime": null,
            "lastTransitionTime": "2019-01-16T13:15:11Z"
          },
          {
            "type": "Ready",
            "status": "False",
            "lastProbeTime": null,
            "lastTransitionTime": "2019-01-16T13:15:11Z",
            "reason": "ContainersNotReady",
            "message": "containers with unready status: [spark-kubernetes-driver]"
          },
          {
            "type": "ContainersReady",
            "status": "False",
            "lastProbeTime": null,
            "lastTransitionTime": null,
            "reason": "ContainersNotReady",
            "message": "containers with unready status: [spark-kubernetes-driver]"
          },
          {
            "type": "PodScheduled",
            "status": "True",
            "lastProbeTime": null,
            "lastTransitionTime": "2019-01-16T13:13:50Z"
          }
        ],
        "hostIP": "10.1.2.3",
        "podIP": "10.11.12.13",
        "startTime": "2019-01-16T13:15:11Z",
        "containerStatuses": [
          {
            "name": "spark-kubernetes-driver",
            "state": {
              "terminated": {
                "exitCode": 1,
                "reason": "Error",
                "startedAt": "2019-01-16T13:15:23Z",
                "finishedAt": "2019-01-16T13:15:23Z",
                "containerID": "docker://931908c3cfa6c2607c9d493c990b392f1e0a8efceff0835a16aa12afd03ec275"
              }
            },
            "lastState": {},
            "ready": false,
            "restartCount": 0,
            "image": "my-repo:10001/spark:latest",
            "imageID": "docker-pullable://my-repo:10001/spark@sha256:58e319143187d3a0df14ceb29a874b35756c4581265f0e1de475390a2d3e6ed7",
            "containerID": "docker://931908c3cfa6c2607c9d493c990b392f1e0a8efceff0835a16aa12afd03ec275"
          }
        ],
        "qosClass": "Burstable"
      }
    }

config-map.yml

{
  "kind": "ConfigMap",
  "apiVersion": "v1",
  "metadata": {
    "name": "spark-pi-1547644451461-driver-conf-map",
    "namespace": "frank",
    "selfLink": "/api/v1/namespaces/frank/configmaps/spark-pi-1547644451461-driver-conf-map",
    "uid": "90eda9e3-1990-11e9-8237-00155df6cf35",
    "resourceVersion": "19241350",
    "creationTimestamp": "2019-01-16T13:13:50Z",
    "ownerReferences": [
      {
        "apiVersion": "v1",
        "kind": "Pod",
        "name": "spark-pi-1547644451461-driver",
        "uid": "90c9577c-1990-11e9-8237-00155df6cf35",
        "controller": true
      }
    ]
  },
  "data": {
    "spark.properties": "#Java properties built from Kubernetes config map with name: spark-pi-1547644451461-driver-conf-map\r\n#Wed Jan 16 13:14:12 GMT 2019\r\nspark.kubernetes.driver.pod.name=spark-pi-1547644451461-driver\r\nspark.driver.host=spark-pi-1547644451461-driver-svc.frank.svc\r\nspark.kubernetes.container.image=aow-repo\\:10001/spark\\:latest\r\nspark.kubernetes.container.image.pullSecrets=mypullsecret\r\nspark.executor.instances=5\r\nspark.app.id=spark-6eafcf5825e94637974f39e5b8512028\r\nspark.app.name=spark-pi\r\nspark.driver.port=7078\r\nspark.kubernetes.resource.type=java\r\nspark.master=k8s\\://https\\://10.1.2.2\\:6443\r\nspark.kubernetes.python.pyFiles=\r\nspark.kubernetes.executor.podNamePrefix=spark-pi-1547644451461\r\nspark.kubernetes.namespace=frank\r\nspark.driver.blockManager.port=7079\r\nspark.jars=/opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar\r\nspark.submit.deployMode=cluster\r\nspark.kubernetes.submitInDriver=true\r\n"
  }
}
-- Frank Wilson
apache-spark
kubernetes

2 Answers

1/17/2019

I think the problem is mostly with the fact that my docker 'latest' tag was pointing at an image that was for the previous version of spark (v2.3.2). It seems that way the container receives parameters from spark-submit and kubernetes has changed a bit. My remaining problems launch spark pipelines seem to be with serviceAccounts (and probably belong in another question).

-- Frank Wilson
Source: StackOverflow

10/22/2019

Spark on Kubernetes has a bug.

During Spark job submission to the Kubernetes cluster we first create Spark Driver Pod: https://github.com/apache/spark/blob/02c5b4f76337cc3901b8741887292bb4478931f3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L130 .

Only after that we create all other resources (eg.: Spark Driver Service), including ConfigMap: https://github.com/apache/spark/blob/02c5b4f76337cc3901b8741887292bb4478931f3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L135 .

We do that so to be able to set Spark Driver Pod as the ownerReference to all of those resources (which cannot be done before we create the owner Pod): https://github.com/apache/spark/blob/02c5b4f76337cc3901b8741887292bb4478931f3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L134.

It helps us to delegate the deletion of all those resources to the Kubernetes responsibility, which is useful for collecting the unused resources more easily in the cluster. All we need to cleanup in that case is just delete Spark Driver Pod. But there is a risk that Kubernetes will instantiate Spark Driver Pod creation before the ConfigMap is ready, which will cause your issue.

This is still true for 2.4.4.

-- Aliaksandr Sasnouskikh
Source: StackOverflow