Kubernetes pods getting auto-restarted on long GC cycles

6/22/2018

We are noticing restarts of our Kubernetes pods (Google Container Engine) whenever JVM's garbage collection duration is a little long.

Specifically speaking, anytime, it seems to cross ~20 seconds, it causes a restart of the pod.

1) The JVM is not out of heap memory. It's still less than 20% of allocated heap. It's just that once in a long while, a particular GC cycle takes long (could be due to IO on that pod's disk at that time)

2) I tried to adjust the liveness check parameters to periodSeconds=12, failureThreshold=5, so that the liveness checking process waits for at least 12 * 5 = 60 seconds before deciding that a pod has become unresponsive and replace it with a new one, but still it's restarting the pod as soon as the GC-pause crosses 20-22 seconds.

Could anyone comment on why this might be happening and what else can I adjust to not restart the pod on this GC-pause? It's a pity, because there is lot of heap capacity still available, and memory is not really the reason it should be replaced.

-- Roshan
garbage-collection
google-kubernetes-engine
kubernetes

1 Answer

6/22/2018

Found it.

I had to adjust timeoutSeconds from default of 1 second to 5 seconds in addition to setting periodSeconds to 12, to make it wait for ~60 seconds before flagging a pod as unresponsive.

-- Roshan
Source: StackOverflow