Node "not ready" state when sum of all running pods exceed node capacity

9/27/2019

I have 5 nodes running in k8s cluster and with around 30 pods. Some of the pods usually take high memory. At one stage we found a node went to “not ready” state when the sum of memory of all running pods exceeded node memory.

Anyhow, I increased the resource request memory to high value for high memory pods but shouldn’t node controller kill all the pods and restarts all instead of making a node to “not ready” state?

Suppose 4 pods were already running in a node and scheduler allowed another pod to get added in that node as resource request memory is within the node left memory capacity. Now over a period of time due to some reason all pods memory started increasing and although each pod memory is still under the individual resource memory limit value but sum of all pods memory exceeds the node memory and this causes the node to “not ready” state.

Is there any way to overcome this situation?

Due to this all pods get shifted to other node or some pods to pending as it has higher resource request value.

Cluster information: Kubernetes version: 1.10.6 Cloud being used: AWS

-- Niket Anand
amazon-web-services
kubernetes

1 Answer

9/30/2019

You can set proper eviction threshold for Memory and restartPolicy in PodSpec.

See details in https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/

-- wineinlib
Source: StackOverflow