Can we update k8s leader immediately when the leader pod is dead?

10/23/2018

I followed this article to use k8s leader election for HA of my app. But I met one issue. Could any one have the same experience? For example, I have 4 pod replicas. One of the pod has already been selected as leader. When this leader pod is down (e.g. kill the pod manually), the scheduler will take 30–40 seconds to start a new pod, but the old dead leader will keep for 10 or more seconds to renew. Is there a way to update the leader immediately when the leader pod is dead? Or is there any setting I missed?

In the article I'm referring, it mentions following content which exactly the problem what I have:

Because pods in Kubernetes have a grace period before termination, this may take 30-40 seconds.

Here is a demo yaml file I'm using. https://gist.githubusercontent.com/ginkgoch/563d8d8caf9e4dd99a0c8de323e9211c/raw/f1abb94647c60874e4625b1b94f8fa125bd1a5ea/k8s-leader-election.yaml

-- Howard
kubernetes
leader-election

1 Answer

10/23/2018

The article explains this is due to the grace period. When the kill is issued the leader pod is not yet dead it is just shutting down.

You could shorten or skip the shutdown process that with a force delete or change the grace period in the specification. The risk then is that the pod might shut down without cleaning up fully - you'll know whether this is relevant to your Pods.

It should theoretically be possible to listen for the preStop hook and begin leader-election as soon as a pod starts terminating. But then you risk having two leaders while the old leader is terminating (k8s should stop sending traffic to the old leader at this point but it might still be doing something important, depending upon your design). The k8s.gcr.io/leader-elector implementation seems to require waiting for the old leader to fully stop. It's possible there are other implementations out there that might support immediate election but I've not found any with a quick search and I think waiting for the old leader to terminate is not unusual.

-- Ryan Dawson
Source: StackOverflow