Safely remove master from Kubernetes HA cluster

1/23/2020

I have a development K8S cluster deployed with kops on AWS EC2 instances which I initially deployed as an HA architecture with 3 masters and 3 nodes.

Now for cost saving I would like to turn off 2 of the 3 masters and leaving just 1 running

I tried with kubectl drain but it was ineffective and just terminating the node caused the cluster connection to be unstable.

Is there a safe way to remove a Master?

-- VenturiEffect
amazon-web-services
kops
kubectl
kubernetes
master

1 Answer

1/23/2020

This issue has been already discussed on Github question - HA to single master migration.

There is already prepared solution for you.

Since etcd-manager was introduced in kops 1.12, and main and events etcd clusters are backup to S3 (same bucket for KOPS_STATE_STORE) automatically and regularly.

So if you have a k8s cluster newer than 1.12 version, maybe you need the following steps:

  1. Delete etcd zones in cluster
$ kops edit cluster

In etcdCluster section, remove etcdMembers items to keep only one instanceGroup for main and events. e.g.

  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-ap-southeast-1a
      name: a
    name: main
  - etcdMembers:
    - instanceGroup: master-ap-southeast-1a
      name: a
    name: events
  1. Apply the changes
$ kops update cluster --yes
$ kops rolling-update cluster --yes
  1. Remove 2 master instance groups
$ kops delete ig master-xxxxxx-1b
$ kops delete ig master-xxxxxx-1c

This action cannot be undone, and it will delete the 2 master nodes immediately.

Now 2 out of 3 of your master nodes are deleted, k8s etcd services might be failed and the kube-api service will be unreachable. It is normal that your kops and kubectl commands do not work anymore after this step.

  1. Restart the ectd cluster with single master node
    This is the tricky part. ssh into the remaining master node, then
$ sudo systemctl stop protokube
$ sudo systemctl stop kubelet

Download the etcd-manager-ctl tool. If using a different etcd-manager version, adjust the download link accordingly

$ wget https://github.com/kopeio/etcd-manager/releases/download/3.0.20190930/etcd-manager-ctl-linux-amd64
$ mv etcd-manager-ctl-linux-amd64 etcd-manager-ctl
$ chmod +x etcd-manager-ctl
$ mv etcd-manager-ctl /usr/local/bin/

Restore backups from S3. See the official docs

$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/main list-backups
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/main restore-backup 2019-10-16T09:42:37Z-000001
# do the same for events
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/events list-backups
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/events restore-backup 2019-10-16T09:42:37Z-000001

This does not start the restore immediately; you need to restart etcd: kill related containers and start kubelet

$ sudo systemctl start kubelet
$ sudo systemctl start protokube

Wait for the restore to finish, then kubectl get nodes and kops validate cluster should be working. If not, you can just terminate the EC2 instance of the remaining master node in AWS console, a new master node will be created by Auto Scaling Groups, and etcd cluster will be restored.

-- VKR
Source: StackOverflow