How to Troubleshoot Frequent Kubernetes Node Resets on GKE?

12/31/2017

I have a test cluster in GKE (it runs my non-essential dev services). I am using the following GKE features for the cluster:

  • preemptible nodes (~4x f1-micro)
  • dedicated ingress node(s)
  • node auto-upgrade
  • node auto-repair
  • auto-scaling node-pools
  • regional cluster
  • stackdriver healthchecks

I created my pre-emptible node-pool thusly (auto-scaling between 3 and 6 actual nodes across 3 zones):

gcloud beta container node-pools create default-pool-f1-micro-preemptible \
    --cluster=dev --zone us-west1 --machine-type=f1-micro --disk-size=10 \
    --preemptible --node-labels=preemptible=true --tags=preemptible \
    --enable-autoupgrade --enable-autorepair --enable-autoscaling \
    --num-nodes=1 --min-nodes=0 --max-nodes=2

It all works great, most of the time. However, around 3 or 4 times per day, I receive healthcheck notifications regarding downtime on some services running on the pre-emptible nodes. (exactly what I would expect ONCE per 24h when the nodes get reclaimed/regenerated. But not 3+ times.)

By the time I receive the email notification, the cluster has already recovered, but when checking kubectl get nodes I can see that the "age" on some of the pre-emptible nodes is ~5min, matching the approx. time of the outage.

I am not sure where to find the logs for what is happening, or WHY the resets were triggered (poorly-set resources settings? unexpected pre-emptible scheduling? "auto-repair"?) I expect this is all in stackdriver somewhere, but I can't find WHERE. The Kubernetes/GKE logs are quite chatty, and everything is at INFO level (either hiding the error text, or the error logs are elsewhere).

I must say, I do enjoy the self-healing nature of the setup, but in this case I would prefer to be able to inspect the broken pods/nodes before they are reclaimed. I would also prefer to troubleshoot without tearing-down/rebuilding the cluster, especially to avoid additional costs.

-- Paul Reimer
google-cloud-platform
google-kubernetes-engine
kubernetes

2 Answers

1/8/2018

I was able to solve this issue through a brute force process, creating several test node-pools in GKE running the same workloads (I didn't bother connecting up ingress, DNS, etc), and varying the options supplied to gcloud beta container node-pools create.

Since I was paying for these experiments, I did not run them all simultaneously, although that would have produced a faster answer. I also did prefer the tests which keep the --preemptible option, since that affects the cost significantly.

My results determined that the issue was with the --enable-autorepair argument and removing it has reduced failed health-checks to an acceptable level (expected for preemptible nodes).

-- Paul Reimer
Source: StackOverflow

12/31/2017

Preemptible VMs offer the same machine types and options as regular compute instances and last for up to 24 hours.

This means that preemptible instance will die no less than once per 24h, but 3-4 times is still well within expectations. Preempts do not guarantee nor state anywhere that it will be only once.

-- Radek 'Goblin' Pieczonka
Source: StackOverflow