K
Q

Question

Kubernetes pods restart issue anomaly

7/23/2018

My Java microservices are running in k8s cluster hosted on AWS EC2 instances.

I have around 30 microservice(a good mix of nodejs and Java 8) running in a K8s cluster. I am facing a challange where my java application pods gets restart unexpectedly which leads to increase in application 5xx count.

To debug this, I started a newrelic agent in pod along with application and found the following graph:

Where I can see that, I have Xmx value as 6GB and my uses is max 5.2GB.

This clearly stats that JVM is not crossing the Xmx value.

But when I describe the pod and look for last state it says "Reason:Error" with "Exit code: 137"

Then on further investigation I find that my Pod average memory uses is close to its limit all the time.(Allocated 9Gib, uses ~9Gib). I am not able to understand why memory uses is so high in Pod even thogh I have only one process running((JVM) and that too is restricted with 6Gib Xmx.

When I login to my worker nodes and check the status of docker containers I can see the last container of that appriction with Exited state and says "Container exits with non-zero exit code 137"

I can see the wokernode kernel logs as:

which shows kernel is terminitaing my process running inside container.

I can see I have lot of free memory in my worker node.

I am not sure why my pods get restart again and again is this k8s behaviour or something spoofy in my infrastructure. This force me to move my application from Container to VM again as this leades to increase in 5xx count.

EDIT: I am getting OOM after increasing memory to 12GB.

I am not getting sure why POD is getting killed because of OOM th ough JVM xmx is 6 GB only.

Need help!

-- Vaibhav Jain

datadog

docker

kubernetes

linux-kernel

newrelic

3 Answers

8/15/2018

In GCloud App Engine you can Specify max. CPU usage threshold, e.b. 0.6. Meaning that if CPU reaches 0.6 of 100% - 60% - a new instance will spawn.

I did not come across such a setting, but maybe: Kubernetes POD/Deployment has similar configuration parameter. Meaning, if RAM of POD reaches 0.6 of 100%, terminate POD. In your case that would be 60% of 9GB = ~5GB. Just some Food for thought.

-- Andrey Bulezyuk

Source: StackOverflow

2/7/2019

Some older Java versions( prior to Java 8 u131 release) don’t recognize that they are running in a container. So even if you specify maximum heap size for the JVM with -Xmx, the JVM will set the maximum heap size based on the host’s total memory instead of the memory available to the container and then when a process tries to allocate memory over its limit(defined in a pod/deployment spec) your container is getting OOMKilled.

These problems might not pop up when running your Java apps in K8 cluster locally, because the difference between pod memory limit and total local machine memory aren’t big. But when you run it in production on nodes with more memory available, then JVM may go over your container memory limit and will be OOMKilled.

Starting from Java 8(u131 release) it is possible to make JVM be “container-aware” so that it recognizes constraints set by container control groups (cgroups).

For Java 8(from U131 release) and Java9 you can set this experimental flags to JVM:

-XX:+UnlockExperimentalVMOptions 
-XX:+UseCGroupMemoryLimitForHeap

It will set the heap size based on your container cgroups memory limit, which is defined as "resources: limits" in your container definition part of the pod/deployment spec. There still probably can be cases of JVM’s off-heap memory increase in Java 8, so you might monitor that, but overall those experimental flags must be handling that as well.

From Java 10 these experimental flags are the new default and are enabled/disabled by using this flag:

 -XX:+UseContainerSupport 
 -XX:-UseContainerSupport

-- Alexz

Source: StackOverflow

7/23/2018

Since you have limitedthe maximum memory usage of your pod to 9Gi, it will be terminated automatically when the memory usage get to 9Gi.

-- Kun Li

Source: StackOverflow

KQ

Kubernetes pods restart issue anomaly

Similar Questions

3 Answers

K
Q