We have a cluster where nodes are out of resources leading to slowness and over-committing issues. This is forcing us to restart the nodes frequently. We are planning to enforce a node allocatable resource policy to split the available CPU, Memory and Ephemeral Storage between system, kubelet, and application pods.
Came across some guidelines on allocatable resource calculation here.
It does not specify how much is the split between system and kubelet resources. Also, we are on open shift and so not sure how much of this is applicable.
As you mentioned you are using OpenShift
and docs you provided are from GCP. Default requirements, parameters might be differnt due to Cloud Provider specs.
Unfortunately I am not OpenShift
user, but you can find in OS documentation that:
Resources reserved for node components are based on two node settings: kube-reserved
and system-reserved
.
You can set these in the kubeletArguments section of the node configuration file (the /etc/origin/node/node-config.yaml file by default) using a set of = pairs (e.g., cpu=200m,memory=512Mi).
How to compute Allocated Resources?
An allocated amount of a resource is computed based on the following formula:
[Allocatable] = [Node Capacity] - [kube-reserved] - [system-reserved]
If [Allocatable]
is negative, it is set to 0.
Please check below OpenShift docs: Allocating node resources, Capacity management, Cluster Limits, Resource Limits
Many factors depends on what exactly pods/images you want to use. Some images might requested 0.1 CPU but others might need 1CPU to start.
You can limit it by create a Quota, set Pod
requests and limits.
Please keep in mind that you can always check current Requests/Limits in each Pod, under the Containers.containerName.Requests:
$ os describe pod <pod-name>
Or requested resources / limits on node
$ os describe node <node-name>
On the bottom of this description you should get All pods reqyest and limits
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default nginx-7cdbd8cdc9-b94r9 100m (10%) 0 (0%) 0 (0%) 0 (0%) 6m2s
default nginx-7cdbd8cdc9-nlsw7 100m (10%) 0 (0%) 0 (0%) 0 (0%) 6m2s
kube-system fluentd-gcp-v3.2.0-lwnqn 100m (10%) 1 (106%) 200Mi (7%) 500Mi (18%) 5h22m
kube-system kube-proxy-gke-stc1-default-pool-094e5c74-4dzj 100m (10%) 0 (0%) 0 (0%) 0 (0%) 5h22m
kube-system prometheus-to-sd-lbj57 1m (0%) 3m (0%) 20Mi (0%) 20Mi (0%) 5h22m
kube-system traefik-749d86f748-frs7q 0 (0%) 0 (0%) 0 (0%) 0 (0%) 158m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 401m (42%) 1003m (106%)
memory 220Mi (8%) 520Mi (19%)
ephemeral-storage 0 (0%) 0 (0%)
attachable-volumes-gce-pd 0 0
Hope it will help.