Cluster reconciliation in the event of node loss

6/22/2017

I have a cluster of 3 nodes that I'd like to recover fast after a single node loss. By recovering I mean that I resume communication with my service after a reasonable amount of time (preferably configurable).

Following are various details:

k8s version:

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.7", GitCommit:"8eb75a5810cba92ccad845ca360cf924f2385881", GitTreeState:"clean", BuildDate:"2017-04-27T10:00:30Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.7", GitCommit:"8eb75a5810cba92ccad845ca360cf924f2385881", GitTreeState:"clean", BuildDate:"2017-04-27T09:42:05Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

I have a service distributed over all 3 nodes. With one node failing I observe the following behavior:

  1. api server fails over to another node and kubernetes service endpoint shows the correct IP address (custom fail-over).
  2. api server is not responding on 10.100.0.1 (its cluster IP)
  3. after some time, all relevant service endpoints are cleared (e.g. in kubectl get ep --namespace=kube-system shows no ready addresses for all endpoints)
  4. the service in question is not available on the service IP (due to the above)

The service has both readiness/liveness probes and only a single instance is ready at any given time with all being live. I've checked that the instance that is supposed to be available is also available - i.e. both ready/live.

This continues for more than 15min before the service Pod that was running on the lost node receives a NodeLost status, at which point the endpoints are re-populated, and I can access the service as usual.

I have tried fiddling with pod-eviction-timeout, node-monitor-grace-period settings to no avail - the time is always roughly the same.

Hence, my questions:

  1. Where can I read up on the behavior of the key k8s components in case of a node loss in detail?
  2. What would be the combination of parameters to reduce the time it takes the cluster to reconcile since this is supposed to be used in a test?
-- deemok
kubernetes
kubernetes-health-check

0 Answers