How does KOPS recreate the Master Node in AWS?

3/27/2019

I'm experimenting Kops on AWS. My cluster is composed by 1 master node and 3 worker nodes. All works fine and in order to test a Master node failure i terminated the corresponding EC2 instance and of course the AutoScaling group handled that problem and created a new instance and it became the the new Master node. So is fine.

My question is how the AutoScaling group did to configure the new EC2 instance to properly be configured as a Master Kubernetes node ? Is there any predefined AMI created when i setup KOPS ? Or is there any user data script that is launched each time a new instance is created ?

Thanks.

-- user1297406
amazon-web-services
kops
kubernetes

2 Answers

11/6/2019

I had the same "next question of how does the etcd survive the deletion of the master?"

I too used to wounder how the masters were able to be completely shut down, OS replaced, upgraded, VM size increased, all without losing the state of etcd, when ectd saved it's data on folders existing on the local host instead of persistent volume claims.

Bash# kubectl exec -it etcd-manager-main-ip-10-10-10-1.ec2.internal --namespace=kube-system -- df -h | egrep -hi "/dev/|Mounted"
Filesystem      Size   Used     Avail   Use%   Mounted on
/dev/nvme0n1p2  100G   6.1G     94G     7%     /rootfs
tmpfs           31G    0        31G     0%     /rootfs/dev/shm
shm             64M    0        64M     0%     /dev/shm
/dev/nvme1n1    20G    447M     20G     3%     /rootfs/mnt/master-vol-0a1f654eb1018c472
/dev/nvme2n1    40G    5.4G     34G     14%    /rootfs/mnt/master-vol-06b6514080c8e7202

Pay attention to the volume mounts
Kops attaches persistent EBS volumes / network attached storage to the masters that's reserved for use by 2 etcd clusters, one to store the state of kubernetes, and a 2nd to store kubernetes events (seperating them increases reliability).

You'll notice that the etcd pods don't define any Persistent Volume Claims (which are usually associated with EBS storage), that's to avoid circuluar dependency on kubernetes components to host kubernetes components.

So how do the masters get the equivalent of EBS volumes without using persistent volume claims you might ask? Simple kops leverages (non-user configured) definitions in the instance groups of the masters that mount EBS volumes on the masters in predefined locations where etcd expects them to exist.

-- neokyle
Source: StackOverflow

4/10/2019

This is because kops has the concept of instance groups. On AWS they are directly mapped to AutoScalingGroup - which is a similar concept. You can check your instance groups by running kops get ig and also you can edit and descale your master and nodes to 0 and then relaunch them by kops edit ig nodes/nameofthemaster. The second part is kops State Store. Which is a location where the cluster configuration is located. This maps to the most of the Kubernetes configuration, except some of the resources and for example deployments (so internal state) which are stored in etcd.

So in your case when you delete a master node, the AWS will see that the state of your AutoScalingGroup is 0 instead of 1 so it will recreate the EC2 machine.

Description:DescriptionLaunching a new EC2 instance: i-0e06f8fbb78aca2e6
Cause:CauseAt 2019-04-10T12:54:31Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 0 to 1.

After that Kubernetes will take it's configuration from the S3 bucket and the internal state from etcd. Next question would be how does the etcd survive the deletion of the master. You can check it in your Volumes, as etcd has two separate volumes (just as etcd pods one for events and one main) After you delete the master, volumes go into avalivable state and after the new master EC2 instance is spawned this volumes will be mounted to the new master, and you will restore the internal state (not sure, but I think protokube is also somewhere in the picture).

This is also the reason why you can restore your kops cluster from just s3 bucket as there is all the configuration that kops need to run. Except the internal state, which is in etcd for which you would need some separate backup.

-- aurelius
Source: StackOverflow