Kubernetes Cluster Resource Recovery

8/14/2019

Recently I had an issue where multiple master nodes went down and that took out the entire k8s cluster. I was able to recover the cluster using kops. However the cluster is now missing all the deployments, pods, namespaces and other resources it had before.

From what I read, in order to to restore the cluster I need to restore etcd state. Since I used kops 1.3 I have backups in s3://my.clusters/test.my.clusters/backups/etcd/events and s3://my.clusters/test.my.clusters/backups/etcd/main. However what I'm having trouble doing is using these to recover my cluster. For example the documentation tells me to run etcd-manager-ctl but where do I do this from and how do I install etcd-manager and connect it to my running etcd containers? Currently my cluster's master node shows the following etcd containers:

sudo docker ps | grep etcd
abcd        k8s.gcr.io/etcd@sha256:123456     "/bin/sh -c 'mkfif..."               k8s_etcd-container_etcd-server-events-ip-10-10-63-66.ec2.internal_kube-system_678
dfsd        k8s.gcr.io/etcd@sha256:asdvce     "/bin/sh -c 'mkfif..."               k8s_etcd-container_etcd-server-ip-10-10-63-66.ec2.internal_kube-system_939
fdhg        k8s.gcr.io/pause-amd64:3.0        "/pause"                             k8s_POD_etcd-server-events-ip-10-10-63-66.ec2.internal_kube-system_678
mbnc        k8s.gcr.io/pause-amd64:3.0        "/pause"                             k8s_POD_etcd-server-ip-10-10-63-66.ec2.internal_kube-system_939

Additional details: I'm running all my nodes on AWS ec2 instances. My cluster has 3 worker and 3 master nodes. Any help would be appreciated.

-- dredbound
amazon-ec2
docker
etcd
kops
kubernetes

0 Answers