I'm trying to submit a spark job through spark-submit to google kubernetes cluster.
The docker image is built from the spark official dockerfile from the 2.3.0 release.
The following is the submit script.
#! /bin/bash
spark-submit \
--master k8s://https://<master url> \
--deploy-mode cluster \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.container.image=<official image> \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.app.name=app-name \
--class ExpletivePI \
--name spark-pi \
local:///opt/spark/examples/spark-demo.jar
I can run this on my local minikube perfectly.
However when I try submit this to my google kubernetes cluster. I always pod unscheduled due to insufficient cpu.
0/3 nodes are available: 3 Insufficient cpu.
kubectl describe node seems okay, and here is the problematic pod describe result
Name: spark-pi-e890cd00394b3b20942f22d0a9173c1c-driver
Namespace: default
Node: <none>
Labels: spark-app-selector=spark-3e8ff877bebd46be9fc8d956531ba186
spark-role=driver
Annotations: spark-app-name=spark-pi
Status: Pending
IP:
Containers:
spark-kubernetes-driver:
Image: geekbeta/spark:v2
Port: <none>
Host Port: <none>
Args:
driver
Limits:
memory: 1408Mi
Requests:
cpu: 1
memory: 1Gi
Environment:
SPARK_DRIVER_MEMORY: 1g
SPARK_DRIVER_CLASS: ExpletivePI
SPARK_DRIVER_ARGS:
SPARK_DRIVER_BIND_ADDRESS: (v1:status.podIP)
SPARK_MOUNTED_CLASSPATH: /opt/spark/tang_stuff/spark-demo.jar:/opt/spark/tang_stuff/spark-demo.jar
SPARK_JAVA_OPT_0: -Dspark.app.name=spark-pi
SPARK_JAVA_OPT_1: -Dspark.app.id=spark-3e8ff877bebd46be9fc8d956531ba186
SPARK_JAVA_OPT_2: -Dspark.driver.host=spark-pi-e890cd00394b3b20942f22d0a9173c1c-driver-svc.default.svc
SPARK_JAVA_OPT_3: -Dspark.submit.deployMode=cluster
SPARK_JAVA_OPT_4: -Dspark.driver.blockManager.port=7079
SPARK_JAVA_OPT_5: -Dspark.kubernetes.executor.podNamePrefix=spark-pi-e890cd00394b3b20942f22d0a9173c1c
SPARK_JAVA_OPT_6: -Dspark.master=k8s://https://35.229.152.59
SPARK_JAVA_OPT_7: -Dspark.kubernetes.authenticate.driver.serviceAccountName=spark
SPARK_JAVA_OPT_8: -Dspark.executor.instances=1
SPARK_JAVA_OPT_9: -Dspark.kubernetes.container.image=geekbeta/spark:v2
SPARK_JAVA_OPT_10: -Dspark.kubernetes.driver.pod.name=spark-pi-e890cd00394b3b20942f22d0a9173c1c-driver
SPARK_JAVA_OPT_11: -Dspark.jars=/opt/spark/tang_stuff/spark-demo.jar,/opt/spark/tang_stuff/spark-demo.jar
SPARK_JAVA_OPT_12: -Dspark.driver.port=7078
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from spark-token-9gdsb (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
spark-token-9gdsb:
Type: Secret (a volume populated by a Secret)
SecretName: spark-token-9gdsb
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m (x125 over 38m) default-scheduler 0/3 nodes are available: 3 Insufficient cpu.
My cluster has 3 cpus and 11G RAM, I'm really confused and don't know what to do at this point, any advice or comments would be greatly appreciated, thank you in advance!
problem solved, seems that the driver pod by default requires 1 cpu, which in my case, is impossible for GCP to accommodate, since each node on my GCP cluster has only one cpu.
By changing the driver pod request cpu to lower value, it can run on GCP