pending spark pod on google kubernetes cluster: insufficient cpu

10/11/2018

I'm trying to submit a spark job through spark-submit to google kubernetes cluster.

The docker image is built from the spark official dockerfile from the 2.3.0 release.

The following is the submit script.

#! /bin/bash
spark-submit \
--master k8s://https://<master url> \
--deploy-mode cluster \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.container.image=<official image> \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.app.name=app-name \
--class ExpletivePI \
--name spark-pi \
local:///opt/spark/examples/spark-demo.jar

I can run this on my local minikube perfectly.

However when I try submit this to my google kubernetes cluster. I always pod unscheduled due to insufficient cpu.

0/3 nodes are available: 3 Insufficient cpu. 

kubectl describe node seems okay, and here is the problematic pod describe result

Name:         spark-pi-e890cd00394b3b20942f22d0a9173c1c-driver
Namespace:    default
Node:         <none>
Labels:       spark-app-selector=spark-3e8ff877bebd46be9fc8d956531ba186
              spark-role=driver
Annotations:  spark-app-name=spark-pi
Status:       Pending
IP:           
Containers:
  spark-kubernetes-driver:
    Image:      geekbeta/spark:v2
    Port:       <none>
    Host Port:  <none>
    Args:
      driver
    Limits:
      memory:  1408Mi
    Requests:
      cpu:     1
      memory:  1Gi
    Environment:
      SPARK_DRIVER_MEMORY:        1g
      SPARK_DRIVER_CLASS:         ExpletivePI
      SPARK_DRIVER_ARGS:          
      SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
      SPARK_MOUNTED_CLASSPATH:    /opt/spark/tang_stuff/spark-demo.jar:/opt/spark/tang_stuff/spark-demo.jar
      SPARK_JAVA_OPT_0:           -Dspark.app.name=spark-pi
      SPARK_JAVA_OPT_1:           -Dspark.app.id=spark-3e8ff877bebd46be9fc8d956531ba186
      SPARK_JAVA_OPT_2:           -Dspark.driver.host=spark-pi-e890cd00394b3b20942f22d0a9173c1c-driver-svc.default.svc
      SPARK_JAVA_OPT_3:           -Dspark.submit.deployMode=cluster
      SPARK_JAVA_OPT_4:           -Dspark.driver.blockManager.port=7079
      SPARK_JAVA_OPT_5:           -Dspark.kubernetes.executor.podNamePrefix=spark-pi-e890cd00394b3b20942f22d0a9173c1c
      SPARK_JAVA_OPT_6:           -Dspark.master=k8s://https://35.229.152.59
      SPARK_JAVA_OPT_7:           -Dspark.kubernetes.authenticate.driver.serviceAccountName=spark
      SPARK_JAVA_OPT_8:           -Dspark.executor.instances=1
      SPARK_JAVA_OPT_9:           -Dspark.kubernetes.container.image=geekbeta/spark:v2
      SPARK_JAVA_OPT_10:          -Dspark.kubernetes.driver.pod.name=spark-pi-e890cd00394b3b20942f22d0a9173c1c-driver
      SPARK_JAVA_OPT_11:          -Dspark.jars=/opt/spark/tang_stuff/spark-demo.jar,/opt/spark/tang_stuff/spark-demo.jar
      SPARK_JAVA_OPT_12:          -Dspark.driver.port=7078
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from spark-token-9gdsb (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  spark-token-9gdsb:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  spark-token-9gdsb
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  3m (x125 over 38m)  default-scheduler  0/3 nodes are available: 3 Insufficient cpu.

My cluster has 3 cpus and 11G RAM, I'm really confused and don't know what to do at this point, any advice or comments would be greatly appreciated, thank you in advance!

-- dex
apache-spark
docker
google-cloud-platform
kubernetes

1 Answer

10/12/2018

problem solved, seems that the driver pod by default requires 1 cpu, which in my case, is impossible for GCP to accommodate, since each node on my GCP cluster has only one cpu.

By changing the driver pod request cpu to lower value, it can run on GCP

-- dex
Source: StackOverflow