GKE cluster upgrade by switching to a new pool: will inter-cluster service communication fail?

9/20/2018

From this article(https://cloudplatform.googleblog.com/2018/06/Kubernetes-best-practices-upgrading-your-clusters-with-zero-downtime.html) I learnt that it is possible to create a new node pool, and cordon and drain old nodes one by one, so that workloads get re-scheduled to new nodes in the new pool.

To me, a new node pool seems to indicate a new cluster. The reason: we have two node pools in GKE, and they're listed as two separate clusters.

My question is: after the pods under a service get moved to a new node, if that service is being called from other pods in the old node, will this inter-cluster service call fail?

-- twimo
google-kubernetes-engine
kubernetes

2 Answers

9/20/2018

You don't create a new cluster per se. You upgrade the master(s) and then you create a new node pool with nodes that have a newer version. Make sure the new node pool shares the same network as the original node pool.

If you have a service with one replica (one pod) if that pod is living in one of the nodes you are upgrading you need to allow time for Kubernetes to create a new replica on a different node that is not being upgraded. For that time, your service will be unavailable.

If you have a service with multiple replicas chances are that you won't see any downtime unless for some odd reason all your replicas are scheduled on the same node.

Recommendation: scale your resources which serve your services (Deployments, DaemonSets, StatefulSets, etc) by one or two replicas before doing node upgrades.

StatefulSet tip: You will have some write downtime if you are running something like mysql in a master-slave config when you reschedule your mysql master.

-- Rico
Source: StackOverflow

9/21/2018

Note that creating a new node Pool does not create a new cluster. You can have multiple node pools within the same cluster. Workloads within the different node pools will still interact with each other since they are in the same cluster.

gcloud container node-pools create (the command to create node pools) requires that you specify the --cluster flag so that the new node pool is created within an existing cluster.

So to answer the question directly, following the steps from that Google link will not cause any service interruption nor will there be any issues with pods from the same cluster communicating with each other during your migration.

-- Patrick W
Source: StackOverflow