What I see: Kubernetes takes into account only the memory used by its components when scheduling new Pods, and considers the remaining memory as free, even if it's being used by other system processes outside Kubernetes. So, when creating new deployments, it attempts to schedule new pods on a suffocated node.
What I expected to see: Kubernetes automatically take in consideration the total memory usage (by kubernetes components + system processes) and schedule it on another node.
As a work-around, is there a configuration parameter that I need to set or is it a bug?
Yes, there are few parameters to allocate resources: You can allocate memory and CPU for your pods and allocate memory and CPU for your system daemons manually. In documentation you could find how it works with the example:
Here is an example to illustrate Node Allocatable computation:
32Gi
of memory
, 16 CPUs
and 100Gi
of Storage
--kube-reserved
is set to cpu=1,memory=2Gi,ephemeral-storage=1Gi
--system-reserved
is set to cpu=500m,memory=1Gi,ephemeral-storage=1Gi
--eviction-hard
is set to memory.available<500Mi,nodefs.available<10%
Under this scenario, Allocatable
will be 14.5 CPUs
, 28.5Gi
of memory and 98Gi
of local storage. Scheduler ensures that the total memory requests
across all pods on this node does not exceed 28.5Gi
and storage doesn’t exceed 88Gi
. Kubelet evicts pods whenever the overall memory usage across pods exceeds 28.5Gi
, or if overall disk usage exceeds 88Gi
If all processes on the node consume as much CPU as they can, pods together cannot consume more than 14.5 CPUs
.
If kube-reserved
and/or system-reserved
is not enforced and system daemons exceed their reservation, kubelet
evicts pods whenever the overall node memory usage is higher than 31.5Gi
or storage
is greater than 90Gi
You can allocate as many as you need for Kubernetes with flag --kube-reserved
and for system with flag -system-reserved
.
Additionally, if you need stricter rules for spawning pods, you could try to use Pod Affinity.
Kubelet has the parameter --system-reserved
that allows you to make a reservation of cpu and memory for system processes.
It is not dynamic (you reserve resources only at launch) but is the only way to tell Kubelet not to use all resources in node.
--system-reserved mapStringString
A set of ResourceName=ResourceQuantity (e.g. cpu=200m,memory=500Mi,ephemeral-storage=1Gi) pairs that describe resources reserved for non-kubernetes components. Currently only cpu and memory are supported. See http://kubernetes.io/docs/user-guide/compute-resources for more detail. [default=none]