Recently I had an issue where multiple master nodes went down and that took out the entire k8s cluster. I was able to recover the cluster using kops. However the cluster is now missing all the deployments, pods, namespaces and other resources it had before.
From what I read, in order to to restore the cluster I need to restore etcd state. Since I used kops 1.3 I have backups in s3://my.clusters/test.my.clusters/backups/etcd/events
and s3://my.clusters/test.my.clusters/backups/etcd/main
. However what I'm having trouble doing is using these to recover my cluster. For example the documentation tells me to run etcd-manager-ctl
but where do I do this from and how do I install etcd-manager and connect it to my running etcd containers? Currently my cluster's master node shows the following etcd containers:
sudo docker ps | grep etcd
abcd k8s.gcr.io/etcd@sha256:123456 "/bin/sh -c 'mkfif..." k8s_etcd-container_etcd-server-events-ip-10-10-63-66.ec2.internal_kube-system_678
dfsd k8s.gcr.io/etcd@sha256:asdvce "/bin/sh -c 'mkfif..." k8s_etcd-container_etcd-server-ip-10-10-63-66.ec2.internal_kube-system_939
fdhg k8s.gcr.io/pause-amd64:3.0 "/pause" k8s_POD_etcd-server-events-ip-10-10-63-66.ec2.internal_kube-system_678
mbnc k8s.gcr.io/pause-amd64:3.0 "/pause" k8s_POD_etcd-server-ip-10-10-63-66.ec2.internal_kube-system_939
Additional details: I'm running all my nodes on AWS ec2 instances. My cluster has 3 worker and 3 master nodes. Any help would be appreciated.