I have a pod that runs a java spring boot service that is disk intensive and eventually gets OOMkilled due to kernel's memory (assume that is inode and page cash) increasing until hits the limit (3Gb). This takes approximately 2 days. This is a separate issue that we are investigating But the problem is that after this first restart, it gets OOMKilled each time faster and faster until it falls into Crashloop. It first lasts 1 hour, then less and less. kubectl top pods
shows that memory went back to normal but still suddenly the container gets killed.
So my questions are:
Deleting the pod with kubectl delete pod
does the job and it last 2 days again. Probably because the node frees the pod and re-allocates it.
We basically log the pod's memory values taking data from /sys/fs/cgroup/memory
folder and after the first restart, values go back to normal but it is still being killed.
We are using:
1.8.0_191-8u191-b12-2ubuntu0.18.04.1-b12
)