Pod is restarted faster each time

5/15/2019

I have a pod that runs a java spring boot service that is disk intensive and eventually gets OOMkilled due to kernel's memory (assume that is inode and page cash) increasing until hits the limit (3Gb). This takes approximately 2 days. This is a separate issue that we are investigating But the problem is that after this first restart, it gets OOMKilled each time faster and faster until it falls into Crashloop. It first lasts 1 hour, then less and less. kubectl top pods shows that memory went back to normal but still suddenly the container gets killed.

So my questions are:

  1. K8s sends a kill to the main process of the container so pod per se is not replaced, right?. What happens with pod resources? Are they cleaned up between container restarts?
  2. What happens with the pod when its container is restarted? Could this be a java leak of some type, file handles? Java memory is pretty low (under 800Mb). JVM is killed in the restart process so that should be happening.
  3. Is it possible that the Node hosting the pod does not free or clear what pod was using?

Deleting the pod with kubectl delete pod does the job and it last 2 days again. Probably because the node frees the pod and re-allocates it.

We basically log the pod's memory values taking data from /sys/fs/cgroup/memory folder and after the first restart, values go back to normal but it is still being killed.

We are using:

  • CentOS with kernel 3.10 in the nodes
  • Java 1.8 (1.8.0_191-8u191-b12-2ubuntu0.18.04.1-b12)
  • I have configured JVM with cgroup awareness options but we know that java is not OOMing.
-- Guillermo Coscarelli
docker
kubernetes

0 Answers