I have a development K8S cluster deployed with kops on AWS EC2 instances which I initially deployed as an HA architecture with 3 masters and 3 nodes.
Now for cost saving I would like to turn off 2 of the 3 masters and leaving just 1 running
I tried with kubectl drain
but it was ineffective and just terminating the node caused the cluster connection to be unstable.
Is there a safe way to remove a Master?
This issue has been already discussed on Github question - HA to single master migration.
There is already prepared solution for you.
Since etcd-manager was introduced in kops 1.12, and main
and events
etcd clusters are backup to S3 (same bucket for KOPS_STATE_STORE
) automatically and regularly.
So if you have a k8s cluster newer than 1.12 version, maybe you need the following steps:
$ kops edit cluster
In etcdCluster
section, remove etcdMembers
items to keep only one instanceGroup
for main
and events
. e.g.
etcdClusters:
- etcdMembers:
- instanceGroup: master-ap-southeast-1a
name: a
name: main
- etcdMembers:
- instanceGroup: master-ap-southeast-1a
name: a
name: events
$ kops update cluster --yes
$ kops rolling-update cluster --yes
$ kops delete ig master-xxxxxx-1b
$ kops delete ig master-xxxxxx-1c
This action cannot be undone, and it will delete the 2 master nodes immediately.
Now 2 out of 3 of your master nodes are deleted, k8s etcd services might be failed and the kube-api service will be unreachable. It is normal that your kops
and kubectl
commands do not work anymore after this step.
$ sudo systemctl stop protokube
$ sudo systemctl stop kubelet
Download the etcd-manager-ctl
tool. If using a different etcd-manager
version, adjust the download link accordingly
$ wget https://github.com/kopeio/etcd-manager/releases/download/3.0.20190930/etcd-manager-ctl-linux-amd64
$ mv etcd-manager-ctl-linux-amd64 etcd-manager-ctl
$ chmod +x etcd-manager-ctl
$ mv etcd-manager-ctl /usr/local/bin/
Restore backups from S3. See the official docs
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/main list-backups
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/main restore-backup 2019-10-16T09:42:37Z-000001
# do the same for events
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/events list-backups
$ etcd-manager-ctl -backup-store=s3://<kops s3 bucket name>/<cluster name>/backups/etcd/events restore-backup 2019-10-16T09:42:37Z-000001
This does not start the restore immediately; you need to restart etcd: kill related containers and start kubelet
$ sudo systemctl start kubelet
$ sudo systemctl start protokube
Wait for the restore to finish, then kubectl get nodes
and kops validate cluster
should be working. If not, you can just terminate the EC2 instance of the remaining master node in AWS console, a new master node will be created by Auto Scaling Groups, and etcd cluster will be restored.