Kubernetes doesn't take into account total node memory usage when starting Pods

5/22/2018

What I see: Kubernetes takes into account only the memory used by its components when scheduling new Pods, and considers the remaining memory as free, even if it's being used by other system processes outside Kubernetes. So, when creating new deployments, it attempts to schedule new pods on a suffocated node.

What I expected to see: Kubernetes automatically take in consideration the total memory usage (by kubernetes components + system processes) and schedule it on another node.

As a work-around, is there a configuration parameter that I need to set or is it a bug?

-- Laylo
kubelet
kubernetes
linux
memory

2 Answers

5/23/2018

Yes, there are few parameters to allocate resources: You can allocate memory and CPU for your pods and allocate memory and CPU for your system daemons manually. In documentation you could find how it works with the example:

Example Scenario

Here is an example to illustrate Node Allocatable computation:

  • Node has 32Gi of memory, 16 CPUs and 100Gi of Storage
  • --kube-reserved is set to cpu=1,memory=2Gi,ephemeral-storage=1Gi
  • --system-reserved is set to cpu=500m,memory=1Gi,ephemeral-storage=1Gi
  • --eviction-hard is set to memory.available<500Mi,nodefs.available<10%

Under this scenario, Allocatable will be 14.5 CPUs, 28.5Gi of memory and 98Gi of local storage. Scheduler ensures that the total memory requests across all pods on this node does not exceed 28.5Gi and storage doesn’t exceed 88Gi. Kubelet evicts pods whenever the overall memory usage across pods exceeds 28.5Gi, or if overall disk usage exceeds 88GiIf all processes on the node consume as much CPU as they can, pods together cannot consume more than 14.5 CPUs.

If kube-reserved and/or system-reserved is not enforced and system daemons exceed their reservation, kubelet evicts pods whenever the overall node memory usage is higher than 31.5Gi or storage is greater than 90Gi

You can allocate as many as you need for Kubernetes with flag --kube-reserved and for system with flag -system-reserved.

Additionally, if you need stricter rules for spawning pods, you could try to use Pod Affinity.

-- Nick Rak
Source: StackOverflow

5/22/2018

Kubelet has the parameter --system-reserved that allows you to make a reservation of cpu and memory for system processes.

It is not dynamic (you reserve resources only at launch) but is the only way to tell Kubelet not to use all resources in node.

--system-reserved mapStringString

A set of ResourceName=ResourceQuantity (e.g. cpu=200m,memory=500Mi,ephemeral-storage=1Gi) pairs that describe resources reserved for non-kubernetes components. Currently only cpu and memory are supported. See http://kubernetes.io/docs/user-guide/compute-resources for more detail. [default=none]

-- Ignacio Millán
Source: StackOverflow