We're sort of new to the whole Kubernetes world, but by now have a number of services running in GKE. Today we saw some strange behaviour though, where one of the processes running inside one of our pods was killed, even though the Pod itself had plenty of resources available, and wasn't anywhere near its limits.
The limits are defined as such:
resources:
requests:
cpu: 100m
memory: 500Mi
limits:
cpu: 1000m
memory: 1500Mi
Inside the pod, a Celery (Python) is running, and this particular one is consuming some fairly long running tasks.
During operation of one of the tasks, the celery process was suddenly killed, seemingly caused by OOM. The GKE Cluster Operations logs show the following:
Memory cgroup out of memory: Kill process 613560 (celery) score 1959 or sacrifice child
Killed process 613560 (celery) total-vm:1764532kB, anon-rss:1481176kB, file-rss:13436kB, shmem-rss:0kB
The resource graph for the time period looks like the following:
As can be clearly seen, neither the CPU or the memory usage was anywhere near the limits defined by the Pod, so we're baffled as to why any OOMKilling occurred? Also a bit baffled by the fact that the process itself was killed, and not the actual Pod?
Is this particular OOM actually happening inside the OS and not in Kubernetes? And if so - is there a solution to getting around this particular problem?
About your statement:
Also a bit baffled by the fact that the process itself was killed, and not the actual Pod?
Compute Resources (CPU/Memory) are configured for Containers, not for Pods.
If a Pod container is OOM killed, the Pod is not evicted. The underlying container is restarted by the kubelet
based on its RestartPolicy
. The Pod will still exist on the same node, and the Restart Count
will be incremented (unless you are using RestartPolicy: Never
, which is not your case).
If you do a kubectl describe
on your pod, the newly spawned container will be in Running
state, but you can find the last restart cause in Last State
. Also, you can check how many times it was restarted:
State: Running
Started: Wed, 27 Feb 2019 10:29:09 +0000
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Wed, 27 Feb 2019 06:27:39 +0000
Finished: Wed, 27 Feb 2019 10:29:08 +0000
Restart Count: 5
The Resource Graph visualization may deviate from the actual use of Memory. As it uses a 1 min interval (mean)
sampling, if your memory suddenly increases over the top limit, the container can be restarted before your average memory usage gets plotted on the graph as a high peak. If your Python container makes short/intermittent high memory usage, it's prone to be restarted even though the values are not in the graph.
With kubectl top
you can view the last memory usage registered for the Pod. Although it will be more precisely to see the memory usage on a specific point in time, keep in mind that it fetches the values from metrics-server
, which have a --metric-resolution
:
The interval at which metrics will be scraped from Kubelets (defaults to 60s).
If your container makes a "spiky" use of memory, you may still see it being restarting without even seeing the memory usage on kubectl top
.