We have a cluster with a master node (foo-1), and two worker nodes (foo-2 and foo-3). We have a pod that was running on foo-3 (as decided by Kubernetes). We purposely shut down foo-3 as an experiment.
My expectation was that Kubernetes would "see" the shutdown, and automatically restart the pod in foo-2. But, it didn't seem to happen. In fact, it seemed to think that the pod was still running on foo-3.
After five minutes of waiting, Kubernetes finally recognized that the cluster node had disappeared, and responded gracefully by restarting the pod on foo-2. Five minutes is too long for us, as this is not a replicated application. How can we make that timeout drastically shorter (like, 10 seconds)? And actually, if the host has a graceful shutdown (like for patching), the effect should be immediate.
There is a --pod-eviction-timeout
parameter in kube-controller-manager which is 5m by default:
--pod-eviction-timeout duration The grace period for deleting pods on failed nodes. (default 5m0s)
You need to modify it if you want to speed up an eviction process.
But if you want to minimize your pod's downtime, when node goes down, you need to modify the following parameters as well:
kubelet: node-status-update-frequency=4s (default 10s)
kube-controller-manager: node-monitor-period=2s (default 5s)
kube-controller-manager: node-monitor-grace-period=16s (default 40s)
kube-controller-manager: pod-eviction-timeout=30s (default 5m)
And, of course, you can always have your deployments with replica 2 and service will be up even if one node goes down.