kubernetes: Elaboration of OOM message in cluster events

9/23/2019

I am working on GKE.

Sometimes I see the following message in my GKE Cluster Operations (which I assume reports the counterpart of kubectl get events)

Memory cgroup out of memory: Kill process 670233 (java) score 1728 or sacrifice child
Killed process 670233 (java) total-vm:16535056kB, anon-rss:10437020kB, file-rss:20112kB, shmem-rss:0kB

The specific pod has the following resources

          limits:
            cpu: 4096m
            memory: 10Gi
          requests:
            cpu: 1024m
            memory: 8Gi

Should I assume the OOM takes place because the pod is reaching > 10Gi (which is the limit)?

My question stems from the fact that when a pod reaches its mem limit, killing it should be handled by the kubelet.

The above seems to be a node-level event (cgroup related event to be more precise) however.

Could it be just coincidental the fact that the sum of Resident Set Size (rss-*) is just above 10Gi which is the pod's limit?

-- pkaramol
google-kubernetes-engine
kubernetes

1 Answer

9/24/2019

This question was in the most part already answered here by Rico:


I believe this is actually answered here

If you check the Linux kernel code here. You'll see:

/*
 * If any of p's children has a different mm and is eligible for kill,
 * the one with the highest oom_badness() score is sacrificed for its
 * parent.  This attempts to lose the minimal amount of work done while
 * still freeing memory.
 */

mm means 'Memory Management'.

The only difference here is that this kill is getting triggered by cgroups because you have probably run into memory limits.


And for the second part:

Should I assume the OOM takes place because the pod is reaching > 10Gi (which is the limit)?

Yes, your pod has a limit set and that's being reached.

-- OhHiMark
Source: StackOverflow