Spark on K8s: executor node pending forever

2/8/2020
cpchung:small$ kubectl get pods
NAME                             READY   STATUS    RESTARTS   AGE
simpleapp-1581273108724-driver   1/1     Running   0          93m
simpleapp-1581273108724-exec-1   0/1     Pending   0          93m

My kubectl version:

cpchung:spark-2.4.4-bin-hadoop2.7$ kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-21T22:17:28Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.0", GitCommit:"0ed33881dc4355495f623c6f22e7dd0b7632b7c0", GitTreeState:"clean", BuildDate:"2018-09-27T16:55:41Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

from kubectl logs :

20/02/08 21:06:09 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[7] at count at SimpleApp.java:12) (first 15 tasks are for partitions Vector(0))
20/02/08 21:06:09 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
20/02/08 21:06:24 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

After running minikube start --kubernetes-version v1.12.0 --cpus 4 --memory 8192 with more cpus and memories , the pod description for the executor pod is showing this:

Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  83s (x541 over 91m)  default-scheduler  0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory.

For the driver pod:

Events:          <none>

What could go wrong here?

Here is the repo which can reproduce the problem: https://github.com/pokeai/k8s_setup/tree/master/small

-- cpchung
apache-spark
kubernetes

1 Answer

2/9/2020

As described in the description of your Pending pod: 0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory., which means that you cluster has not enough resources to run the Pod.

To overcome it you have 2 options:

  • Increase the resources amount of the Minikube cluster; you can do that by adding additional parameters to the cluster creation command: minikube start ... --cpus 4 --memory 8192
  • Change CPU and memory requests for the Spark Pods; it's a bit trickier and to get the idea how Spark Driver and Executor resources requests are set please refer to driver and executor spec setups.

Note that you cannot change Driver Pod CPU requests, its value is always set to 1 core. Also take into account that there is some memory overhead added to the requested amount for both driver and executor Pods.

Hope it helps.

-- Aliaksandr Sasnouskikh
Source: StackOverflow