How can I restore master node after it failed or its instance down?

6/11/2019

I am running an API service with Kubernetes. So it is set up like 3 aws instances(one master node, two worker node). I am kinda considering a scenario that when an instance that has a master node is down or crash, whatever happens, how should I restore master node?

when I use docker-swarm, it automatically backed on, then attached to worker(or worker attached to her) and it works fine!

I tried kubeadm init again but it shows errors

error execution phase preflight: [preflight] Some fatal errors occurred:
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
    [ERROR Swap]: running with swap on is not supported. Please disable swap
    [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty

how should I solve this?

-- Bruce Oh
kubernetes

3 Answers

6/12/2019

You can use Rancher on top of your kubernetes cluster. When using rke, you can simply deploy a cluster within minutes and use the etcd snapshots feature.

Backups and Disaster Recovery

Generally talking, you should take care of the etcd node. This is were your cluster data is stored. In case of disaster, you will always restore the etcd node.

-- lowmath
Source: StackOverflow

6/12/2019

If you have just one master you need to back up manually certificates and etcd before running kubeadm init.

I would recommend reading a really nice article Backup and Restore a Kubernetes Master with Kubeadm which tell you what files needs to be backed up and how to restore them on a new master.

They are using Kubernetes CronJob to make a snapshot each 3 minutes of the etcd

We will create Kubernetes CronJob to run that command periodically. There is no need to install etcdctl on the host system or to configure a cron job on the host system.

You will have to remember to backup certificates by yourself, but you only do this once and it can be done when you're cunning kubeadm init.

Also you can have a look into Options for Highly Available Topology

-- Crou
Source: StackOverflow

6/11/2019

For a high availability kubernetes cluster you'll need at least three masters nodes. Please read the doc.

-- mdaguete
Source: StackOverflow