Kubernetes rescheduling of pods after a node becomes unreachable

1/29/2017

The example of running ZooKeeper in Kubernetes shows how pods can be rescheduled on different nodes following a node being drained from connections, typically because maintenance has to be performed on it.

The pods are rescheduled with the same identity that they have before, in this case the myid ZooKeeper servers, corresponding to the incremental number of each pod zk-0, zk-1 and so on.

If a node is not responding (possibly due to overload or a network problem), is it possible for Kubernetes to reschedule a pod on another node while the original pod is still running?

It seems some of this behavior has been specified explicitly, but I don't know how to verify it short of setting up multiple nodes on a cloud provider and trying it.

-- giorgiosironi
kubernetes

1 Answer

1/30/2017

If a node is unresponsive, Kubernetes >=1.5 will not reschedule a pod with the same identity until it is confirmed that it has been terminated. The behavior with respect to StatefulSet is detailed here.

Kubernetes (versions 1.5 or newer) will not delete Pods just because a Node is unreachable. The Pods running on an unreachable Node enter the ‘Terminating’ or ‘Unknown’ state after a timeout. Pods may also enter these states when the user attempts graceful deletion of a Pod on an unreachable Node. The only ways in which a Pod in such a state can be removed from the apiserver are as follows:

  • The Node object is deleted (either by you, or by the Node Controller).
  • The kubelet on the unresponsive Node starts responding, kills the Pod and removes the entry from the apiserver.
  • Force deletion of the Pod by the user.

Since the name of the pod is a lock, we never creating two pods with the same identity, giving us 'at-most-one' semantics for StatefulSets. The user can override this behavior by performing force-deletions (and manually guaranteeing that the pod is fenced) but there is no automation that can lead to two pods with the same identity.

The changes in Kubernetes 1.5 ensure that we prioritize safety in case of StatefulSets. Node Controller documentation has some details about the change. You can read the full proposal here.

-- Anirudh Ramanathan
Source: StackOverflow