How do I make Kubernetes evict a pod that has not been ready for a period of time?

8/16/2019

I have readiness probes configured on several pods (which are members of deployment-managed replica sets). They work as expected -- readiness is required as part of the deployment's rollout strategy, and if a healthy pod becomes NotReady, the associated Service will remove it from the pool of endpoints until it becomes Ready again.

Furthermore, I have external health checking (using Sensu) that alerts me when a pod becomes NotReady.

Sometimes, a pod will report NotReady for an extended period of time, showing no sign of recovery. I would like to configure things such that, if a pod stays in NotReady for an extended period of time, it gets evicted from the node and rescheduled elsewhere. I'll settle for a mechanism that simply kills the container (leading it to be restarted in the same pod), but what I really want is for the pod to be evicted and rescheduled.

I can't seem to find anything that does this. Does it exist? Most of my searching turns up things about evicting pods from NotReady nodes, which is not what I'm looking for at all.

If this doesn't exist, why? Is there some other mechanism I should be using to accomplish something equivalent?

EDIT: I should have specified that I also have liveness probes configured and working the way I want. In the scenario I’m talking about, the pods are “alive.” My liveness probe is configured to detect more severe failures and restart accordingly and is working as desired.

I’m really looking for the ability to evict based on a pod being live but not ready for an extended period of time.

I guess what I really want is the ability to configure an arbitrary number of probes, each with different expectations it checks for, and each with different actions it will take if a certain failure threshold is reached. As it is now, liveness failures have only one method of recourse (restart the container), and readiness failures also have just one (just wait). I want to be able to configure any action.

-- JakeRobb
kubernetes

3 Answers

8/17/2019

As of Kubernetes v1.15, you might want to use a combination of readiness probe and liveness probe to achieve the outcome that you want . See configure liveness and readiness probes.

A new feature to start the liveness probe after the pod is ready is likely to be introduced in v1.16. There will be a new probe called startupProbe that can handle this in a more intuitive manner.

-- Seth
Source: StackOverflow

9/24/2019

You may try to use for that purpose Prometheus metrics and create an alert like here. Based on that you can configure a webhook in alertmanager, which will react properly ( action: kill POD ) and the Pod will be then recreated by the scheduler.

-- mario
Source: StackOverflow

8/17/2019

You can configure HTTP liveness probe or TCP liveness probe with periodSeconds depends on the type of the container images.

    livenessProbe:
       .....
       initialDelaySeconds: 3
       periodSeconds: 5  [ This field specifies that kubelet should perform liveness probe every 3 seconds. ]
-- Subramanian Manickam
Source: StackOverflow