Resource-utilization based liveness checks in Kubernetes

1/4/2019

In Kubernetes, we have a liveness probe which periodically checks whether the container is accessible and, kills and spawns a new one otherwise.

We have a Java webapp and in most of the cases, I see that the application becomes unavailable due to memory pressure. We have a liveness probe, but since the health check service call doesn't take much memory, it succeeds even though a lot of other requests which require more memory linger on.

The GC keeps on running continuously to reclaim the memory but to no avail. The instance never recovers. In such a state, I would like Kubernetes to kill the pod, but given that liveness probe still succeeds, it doesn't. One way to handle this could be to make liveness probe a more resource intensive operation, but then, it would consume more cycles and put additional load on the system.

So, I would like to have some kind of a liveness check which monitors the slope of the graph of Garbage collection counts of the Java process. Another way to state the same is that I want my liveness probe to depend upon telemetry data. Is there any way to achieve that?

-- KKishore
kubernetes

1 Answer

1/4/2019

The health probes are often used in the form of HTTP requests that check the status code returned by the HTTP endpoint. However, you can also execute scripts as health checks and the kubernetes documentation provides an example which does a cat on a file. Instead of doing a cat on a file, you could run a custom script command to check the stat you want (e.g. java heap size). If the script is complex maybe you'd want to include that script in your image or mount it into the container from a configmap. There will be other ways to get metrics other than running bash commands as you could go to the k8s metrics API. Or you could get your java app to report directly with a rest endpoint that you can call to (e.g. something like spring boot actuator).

-- Ryan Dawson
Source: StackOverflow