Why would kubernetes OOM kill a pause container?

9/9/2019

Occasionally our Kubernetes cluster (1.12.7 on GKE) emits an event like:

1 OOMKilling: Memory cgroup out of memory: Kill process 605624 (pause) score 0 or sacrifice child
Killed process 605624 (pause) total-vm:1024kB, anon-rss:4kB, file-rss:0kB, shmem-rss:0kB

At exactly the same time we see an event from Docker:

datadog/agent@sha256:904c18135ec534fc81c29a42537725946b23ba19ac3c0a8b2e942fe39981af20 1 oom 1 kill 1 die 1 stop 1 destroy 1 create 1 start on gke-prod-pool-1-dc20d284-sjwm...

OOM     k8s_datadog-agent_datadog-agent-76v8c_default_bf678779-c318-11e9-b064-42010a9a0059_7  
KILL    k8s_datadog-agent_datadog-agent-76v8c_default_bf678779-c318-11e9-b064-42010a9a0059_7  
...

The datadog-agent pod is also restarted.

This particular pod is configured with guaranteed QoS. According to the table under Node OOM Behaviour on this page, a guaranteed pod should have an oom score of -998.

Why is the pause container being killed here? And since it appears to be the pause container for the datadog-agent pod, why is the oom score 0, not -998?

-- Matt
google-kubernetes-engine
kubernetes
out-of-memory

1 Answer

9/9/2019

Kubernetes has its mechanism to specify min and max requirements for containers; for example, you can have:

 resources:
    limits: 150M
 requests:
    memory: 50M

When your pod crosses the line of the maximum "limit" the pod will get killed and reported as OOM-Killed, it doesn't really matter if you have enough memory or not; as you already investigated this feed the OOM Killer process by giving them scores, the higher score the most likely that process will be terminated and, viceversa, the lower the score, the less likely is that it will be sent to the slaughter house (thus using negative values for processes that you want to preserve); since this is not all what is got, the kernel also has some oom_score_adj to be tweaked and control, let's say, artificially, the score and to preserve or not say process in the killing line.

When the memory limit is set for a pod, K8s issues an OOM_SCORE_ADJ value based on, what you correctly state, QoS (burstable) to ensure such specific container gets selected by the OOMK and gets killed.

Finally, to address the score question, on why it gets such score, this "table" maybe of use for you:

 +------------------+------------------------+
 |     QoS          |           Score        |
 |  (Setting)       |     (Value or formula) |
 +-------------------------------------------+
 | Guaranteed       |           -998         |
 +-------------------------------------------+
 | Best effort      |           1000         |
 +-------------------------------------------+
 | Burstable        |          Formula       |
 +------------------+------------------------+

Where Score formula for Burstable es equals to:

 min( max(2, 1000 - (1000 * mRB) / mMCB), 999)
 mRB = Memory Request Bytes.
 mMCB = Machine Memory Capacity Bytes

And since QoS for GKE/Kubernetes is burstable, well, that's how it gets in line for the killing room.

I trust this information will be useful for you, but I recommend also to check out the following links: [1] OOR-Node OOM Behavior (Kubernetes.io) [2] Container and pod memory assignation

Regards!

[1] https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#node-oom-behavior

[2] https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/#exceed-a-containers-memory-limit

-- JorgeHPM
Source: StackOverflow