Request vs limit cpu in kubernates/openshift

2/22/2019

I have some dilemma to choose what should be the right request and limit setting for a pod in Openshift. Some data:

  1. during start up, the application requires at least 600 millicores to be able to fulfill the readiness check within 150 seconds.
  2. after start up, 200 millicores should be sufficient for the application to stay in idle state.

So my understanding from documentation:

CPU Requests

Each container in a pod can specify the amount of CPU it requests on a node. The scheduler uses CPU requests to find a node with an appropriate fit for a container. The CPU request represents a minimum amount of CPU that your container may consume, but if there is no contention for CPU, it can use all available CPU on the node. If there is CPU contention on the node, CPU requests provide a relative weight across all containers on the system for how much CPU time the container may use. On the node, CPU requests map to Kernel CFS shares to enforce this behavior.

Noted that the scheduler will refer to the request CPU to perform allocation on the node, and then it is a guarantee resource once allocated. Also on the other side, I might allocate extra CPU as the 600 millicores might be only required during start up.

So should i go for

resources:
    limits:
      cpu: 1
    requests:
      cpu: 600m

for guarantee resource or

resources:
    limits:
      cpu: 1
    requests:
      cpu: 200m 

for better cpu saving

-- bLaXjack
kubernetes
openshift

1 Answer

2/22/2019

I think you didn't get the idea of Requests vs Limits, I would recommend you take a look on the docs before you take that decision.

In a brief explanation,

Request is how much resource will be virtually allocated to the container, it is a guarantee that you can use it when you need, does not mean it keeps reserved exclusively to the container. With that said, if you request 200mb of RAM but only uses 100mb, the other 100mb will be "borrowed" by other containers when they consume all their Requested memory, and will be "claimed back" when your container needs it.

Limit is simple terms, is how much the container can consume, requested + borrow from other containers, before it is shutdown for consuming too much resources.

  1. If a Container exceeds its memory limit, it will probably be terminated.
  2. If a Container exceeds its memory request, it is likely that its Pod will be evicted whenever the node runs out of memory.

In simple terms, the limit is an absolute value, it should be equal or higher than the request, and the good practice is to avoid having the limits higher than the request for all containers, only in cases while certain workloads might need it, this is because most of the containers can consume more resources (ie: memory) than they requested, suddenly the PODs will start to be evicted from the node in an unpredictable way that makes it worse than if had a fixed limit for each one.

There is also a nice post in the docker docs about resources limits.

The scheduling rule is the same for CPU and Memory, K8s will only assign a POD to a the node if the node has enough CPU and Memory allocatable to fit all resources requested by the containers within a pod.

The execution rule is a bit different:

Memory is a limited resource in the node and the capacity is an absolute limit, the containers can't consume more than the node have capacity.

The CPU on the other hand is measure as CPU time, when you reserve a CPU capacity, you are telling how much CPU time a container can use, if the container need more time than the requested, it can be throttled and go to an execution queue until other containers have consumed their allocated time or finished their work. In summary is very similar to memory, but is very unlikely the container being killed for consuming too much CPU. The container will be able to use more CPU when the other containers does not use the full CPU time allocated to them. The main issue is when a container uses more CPU than was allocated, the throttling will degrade de performance of the application and at certain point might stop working properly. If you do not provide limits, the containers will start affecting other resources in the node.

Regarding the values to be used, there is no right value or right formula, each application requires a different approach, only measuring multiple times you can find the right value, the advice I give to you is to identify the min and the max and adjust somewhere in the middle, then keep monitoring to see how it behaves, if you feel is wasting\lacking resources you can reduce\increase to an optimal value. If the service is something crucial, start with higher values and reduce afterwards.

For readiness check, you should not use it as parameters to specify these values, you can delay the readiness using initialDelaySeconds parameter in the probe to give extra time to start the POD containers.

PS: I quoted the terms "Borrow" and "Claimed back" because the container is not actually borrowing from another container, in general, the node have a pool of memory and give you chunk of the memory to the container when they need it, so the memory is not technically borrowed from the container but from the Pool.

-- Diego Mendes
Source: StackOverflow