GKE: Kubernetes Master/kubectl unresponsive during node scale

8/28/2019

Given is a cluster rather static workloads that are deployed to one fixed-size node-pool (default). An additional node-pool holds elastic workloads, the pool size changes from 0 - ~10 instances. During the scaling most of the times cluster is not responsive:

  1. I can't access some cluster pages on GKE like workloads (sorry for the German interface) https://i.stack.imgur.com/MSd3Y.png
  2. kubectl cant connect and existing connections like port-forward but also get pods -w would disconnect:
    1. E0828 12:36:14.495621 10818 portforward.go:233] lost connection to pod
    2. The connection to the server 35.205.157.182 was refused - did you specify the right host or port?
  3. Also, I think relying tools like prom-operator run into issues, as some very default parameters like kube_pod_container_info are missing data during that time

What I tried so far, is switching from a regional to a zonal cluster (no-single-node-master?) but that didn't help. Also, the issue does not occur on every scale of the node-pool but in most cases.

So my question is - how to debug/fix that?

-- Can
google-kubernetes-engine
kubernetes

1 Answer

8/29/2019

This is an expected behavior.

When you create your cluster the machine used for the master is chosen based on the nodepool size, then when the autoscaler creates more nodes the machine type of the master will be changed to be able to handle the new number of nodes.

The period during the master is updated to the new machine type you will lose connection to the API and receive the message reported, also since the communication with the API broken you can’t visualize in the cloud console any information related to the cluster as the attached image shows.

You can try to avoid this changing the minimum of nodes at the creation time, for example, you mentioned the limits used are 0 and 10, so when the cluster is created, you can use the middle point 5 which likely support the max number of nodes in case the workloads requires them.

-- Edgar Gore
Source: StackOverflow