How highly available is the master zone of my GKE cluster?

6/2/2017

GKE seems to create a cluster using one availability zone for the master although it provides an option to deploy nodes to multiple availability zones. I am concerned that if master AZ goes down, I cannot manage my cluster anymore. I understand my apps will continue to run but it is a big concern that I cannot scale up my service or deploy a new version of my apps, etc.

Is my understanding of "GKE cluster is vulnerable to master zone going down" correct? If not, can you please explain how? If it is correct, what are my options to make it highly available so that it can tolerate one availability zone going down?

-- Kamil
gcp
google-cloud-platform
kubernetes

2 Answers

6/2/2017

The GKE master today is not highly available and if a zone goes down, your cluster's Kubernetes API will go down with it. However you should note that GKE master is managed service with a 99.5% SLA. https://cloud.google.com/container-engine/sla In the future, GKE may offer high-availability solutions for the master (API server).

Your understanding is correct that if the Kubernetes master/API becomes unavailable for a brief amount of time, it does not impact your deployed workloads (e.g. websites or other services) running on the cluster. But you will not be able to scale up/down things.

As a user, you cannot do anything to make the master highly available today.

However, I would say 99.5% is a pretty good uptime. It corresponds to 7 minutes a day (https://uptime.is/99.5) and if you are not managing your cluster 24/7, you are likely to see issues every now and then. If you are using automation, you should probably have some retry logic.

-- AhmetB - Google
Source: StackOverflow

1/8/2019

GKE regional clusters, which offer a multi-master setup with one master in each zone in the region, are now generally available. See the launch blog post for a quick overview.

-- lambshaanxy
Source: StackOverflow