How to handle CPU contention for burstable k8s pods?

10/9/2019

The use case I'm trying to get my head around takes place when you have various burstable pods scheduled on the same node. How can you ensure that the workload in a specific pod takes priority over another pod when the node's kernel is scheduling CPU and the CPU is fully burdened? In a typical Linux host my thoughts on contention between processes immediately goes to 'niceness' of the processes, however I don't see any equivalent k8s mechanism allowing for specification of CPU scheduling priority between the processes within pods on a node.

I've read of the newest capabilities provided by k8s which (if I interpret the documentation correctly) is just providing a mechanism for CPU pinning to pods which doesn't really scratch my itch. I'd still like to maximize CPU utilization by the "second class" pods if the higher priority pods don't have an active workload while allowing the higher priority workload to have CPU scheduling priority should the need arise.

So far, having not found a satisfactory answer I'm thinking that the community will opt for an architectural solution, like auto-scaling or segregating the workloads between nodes. I don't consider these to be truly addressing the issue, but really just throwing more CPUs at it which is what I'd like to avoid. Why spin up more nodes when you've got idle CPU?

-- Jon Sherry
kubernetes

3 Answers

10/9/2019

Let me first explain how CPU allocation and utilization happen in k8s (memory is bit different)

You define CPU requirement as below. where we define CPU as shares of thousand.

resources:
  requests:
    cpu: 50m
  limits:
    cpu: 100m

In the above example, we ask for min 5% and max 10% of CPU shares.

Requests are used by kubernetes to schedule the pod. If a node has free CPU more than 5% only then the pod is scheduled on that node.

The limits are passed to docker(or any other runtime) which then configure cpu.shares in cgroups.

So if you Request for 5% of CPU and use only 1% then remaining are not locked to this pod and other pods can use this free CPU's to ensure that all pod gets required CPU which ensures high CPU utilization of node.

If you limit for 10% and then try to use more than that then Linux will throttle CPU uses but it won't kill pod.

So coming to your question you can set higher limits for your burstable pod and unless all pod cpu bursting at the same time you are ok. If they burst at the same time they will get equal CPU as avaliability.

you can use pod affinity-and-anti-affinity to schedule all burstable pods on a different node.

-- yogesh kunjir
Source: StackOverflow

10/9/2019

As already mentioned a resource management in Pods is declared with requests and limits.

There are 3 QoS Classes in Kubernetes based on requests and limits configuration:

  1. Guaranteed (limits == requests)
  2. Burstable (limits > requests)
  3. Best Effort (limits and requests are unspecified)

Both of 2) and 3) might be considered as "burstable" in a sense it may consume more resources than requested.

The closest fit for your case might be using Burtstable Class for higher priority Pods and Best Effort of all other.

-- esboych
Source: StackOverflow

10/9/2019

The CPU request correlates to cgroup CPU priority. Basically if Pod A has a request of 100m CPU and Pod B has 200m, even in a starvation situation B will get twice as many run seconds as A.

-- coderanger
Source: StackOverflow