Kubernetes OOM pod killed because kernel memory grows to much

12/12/2018

I am working on a java service that basically creates files in a network file system to store data. It runs in a k8s cluster in a Ubuntu 18.04 LTS. When we began to limit the memory in kubernetes (limits: memory: 3Gi), the pods began to be OOMKilled by kubernetes.

At the beginning we thought it was a leak of memory in the java process, but analyzing more deeply we noticed that the problem is the memory of the kernel. We validated that looking at the file /sys/fs/cgroup/memory/memory.kmem.usage_in_bytes

We isolated the case to only create files (without java) with the DD command like this:

for i in {1..50000}; do dd if=/dev/urandom bs=4096 count=1 of=file$i; done

And with the dd command we saw that the same thing happened ( the kernel memory grew until OOM). After k8s restarted the pod, I got doing a describe pod:

  • Last State:Terminated
  • Reason: OOMKilled
  • Exit Code: 143

Creating files cause the kernel memory grows, deleting those files cause the memory decreases . But our services store data , so it creates a lot of files continuously, until the pod is killed and restarted because OOMKilled.

We tested limiting the kernel memory using a stand alone docker with the --kernel-memory parameter and it worked as expected. The kernel memory grew to the limit and did not rise anymore. But we did not find any way to do that in a kubernetes cluster. Is there a way to limit the kernel memory in a K8S environment ? Why the creation of files causes the kernel memory grows and it is not released ?

-- Pablo Hadziatanasiu
docker
kubernetes
linux-kernel
out-of-memory

1 Answer

2/10/2019

Thanks for all this info, it was very useful!

On my app, I solved this by creating a new side container that runs a cron job, every 5 minutes with the following command:

echo 3 > /proc/sys/vm/drop_caches

(note that you need the side container to run in privileged mode)

It works nicely and has the advantage of being predictable: every 5 minutes, your memory cache will be cleared.

-- Cyrille99
Source: StackOverflow