I have a Kubernetes cluster (v1.5.6) with 3 nodes etcd cluster (etcd version 3.1.5) on vmware. This etcd nodes are running in three docker containers(on three hosts) on coreos on vmware.
I try to backup etcd with the following solution:
docker run --rm --net=host -v /tmp:/etcd_backup -e ETCDCTL_API=3 quay.io/coreos/etcd:v3.1.5 etcdctl --endpoints=[1.1.1.1:2379,2.2.2.2:2379,3.3.3.3:2379] snapshot save etcd_backup/snapshot.db
The backup has been completed succesfully.
I want to create this kubernetes cluster from zero in another vmware environment, but I need to restore etcd from snapshot for this.
So far, I have not found the right solution that works with etcd in docker containers.
I try to restore with the following method, but unfortunately I did not succeed.
First, I created a new etcd node after I run the following command:
docker run --rm --net=host -v /tmp/etcd_bak:/etcd_backup -e ETCDCTL_API=3 registry:5000/quay.io/coreos/etcd:v3.1.5 etcdctl snapshot restore etcd_backup/snapshot.db --name etcd0 --initial-cluster etcd0=http://etcd0:2380,etcd1=http://etcd1:2380,etcd2=http://etcd2:2380 --initial-cluster-token etcd-cluster-1 --initial-advertise-peer-urls http://etcd0:2380
Result:
2018-06-04 09:25:52.314747 I | etcdserver/membership: added member 7ff5c9c6942f82e [http://etcd0:2380] to cluster 5d1b637f4b7740d5
2018-06-04 09:25:52.314940 I | etcdserver/membership: added member 91b417e7701c2eeb [http://etcd2:2380] to cluster 5d1b637f4b7740d5
2018-06-04 09:25:52.315096 I | etcdserver/membership: added member faeb78734ee4a93d [http://etcd1:2380] to cluster 5d1b637f4b7740d5
Unfortunately, nothing happens.
What is the good solution to restore the etcd backup?
How do I create an empty etcd cluster/node and how should I restore the snapshot?
according to the Etcd Disaster Recovery document, you need restore all three etcd nodes from snapshot with commands like yours, then run three node with commands like this:
etcd \
--name m1 \
--listen-client-urls http://host1:2379 \
--advertise-client-urls http://host1:2379 \
--listen-peer-urls http://host1:2380 &
Also, you can extract etcdctl from the image, like this:
docker run --rm -v /opt/bin:/opt/bin registry:5000/quay.io/coreos/etcd:v3.1.5 cp /usr/local/bin/etcdctl /opt/bin
Then use etcdctl to restore snapshot:
# ETCDCTL_API=3 ./etcdctl snapshot restore snapshot.db \
--name m1 \
--initial-cluster m1=http://host1:2380,m2=http://host2:2380,m3=http://host3:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-advertise-peer-urls http://host1:2380 \
--data-dir /var/lib/etcd
This will restore snapshot to the /var/lib/etcd directory. Then start etcd with docker, don't forget mount /var/lib/etcd into your container, and specify --data-dir to it .
Ectd in kubernetes is running in Docker containers, here was what I did to recovery the cluster:
retrieve Etcd cluster metedata
docker inspect etcd1
you'd got something like below:
"Binds": [
"/etc/ssl/certs:/etc/ssl/certs:ro",
"/etc/ssl/etcd/ssl:/etc/ssl/etcd/ssl:ro",
"/var/lib/etcd:/var/lib/etcd:rw"
],
...
"Env": [
"ETCD_DATA_DIR=/var/lib/etcd",
"ETCD_ADVERTISE_CLIENT_URLS=https://172.16.60.1:2379",
"ETCD_INITIAL_ADVERTISE_PEER_URLS=https://172.16.60.1:2380",
"ETCD_INITIAL_CLUSTER_STATE=existing",
"ETCD_METRICS=basic",
"ETCD_LISTEN_CLIENT_URLS=https://172.16.60.1:2379,https://127.0.0.1:2379",
"ETCD_ELECTION_TIMEOUT=5000",
"ETCD_HEARTBEAT_INTERVAL=250",
"ETCD_INITIAL_CLUSTER_TOKEN=k8s_etcd",
"ETCD_LISTEN_PEER_URLS=https://172.16.60.1:2380",
"ETCD_NAME=etcd1",
"ETCD_PROXY=off",
"ETCD_INITIAL_CLUSTER=etcd1=https://172.16.60.1:2380,etcd2=https://172.16.60.2:2380,etcd3=https://172.16.60.2:2380",
"ETCD_AUTO_COMPACTION_RETENTION=8",
"ETCD_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem",
"ETCD_CERT_FILE=/etc/ssl/etcd/ssl/member-node01.pem",
"ETCD_KEY_FILE=/etc/ssl/etcd/ssl/member-node01-key.pem",
"ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem",
"ETCD_PEER_CERT_FILE=/etc/ssl/etcd/ssl/member-node01.pem",
"ETCD_PEER_KEY_FILE=/etc/ssl/etcd/ssl/member-node01-key.pem",
"ETCD_PEER_CLIENT_CERT_AUTH=true",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
],
"Cmd": [
"/usr/local/bin/etcd"
],
copy etcd snapshotdb to other etcd nodes
scp snapshotdb_20180913 node02:/root/
scp snapshotdb_20180913 node03:/root/
rebuild a new cluster with original info
# etcd1
docker stop etcd1
rm -rf /var/lib/etcd
ETCDCTL_API=3 etcdctl snapshot restore snapshotdb_20180913 \
--cacert /etc/ssl/etcd/ssl/ca.pem \
--cert /etc/ssl/etcd/ssl/member-node01.pem \
--key /etc/ssl/etcd/ssl/member-node01-key.pem \
--name etcd1 \
--initial-cluster etcd1=https://node01:2380,etcd2=https://node02:2380,etcd3=https://node03:2380 \
--initial-cluster-token k8s_etcd \
--initial-advertise-peer-urls https://node01:2380 \
--data-dir /var/lib/etcd
# etcd2
docker stop etcd2
rm -rf /var/lib/etcd
ETCDCTL_API=3 etcdctl snapshot restore snapshotdb_20180913 \
--cacert /etc/ssl/etcd/ssl/ca.pem \
--cert /etc/ssl/etcd/ssl/member-node02.pem \
--key /etc/ssl/etcd/ssl/member-node02-key.pem \
--name etcd2 \
--initial-cluster etcd1=https://node01:2380,etcd2=https://node02:2380,etcd3=https://node03:2380 \
--initial-cluster-token k8s_etcd \
--initial-advertise-peer-urls https://node02:2380 \
--data-dir /var/lib/etcd
# etcd3
docker stop etcd3
rm -rf /var/lib/etcd
ETCDCTL_API=3 etcdctl snapshot restore snapshotdb_20180913 \
--cacert /etc/ssl/etcd/ssl/ca.pem \
--cert /etc/ssl/etcd/ssl/member-node03.pem \
--key /etc/ssl/etcd/ssl/member-node03-key.pem \
--name etcd3 \
--initial-cluster etcd1=https://node01:2380,etcd2=https://node02:2380,etcd3=https://node03:2380 \
--initial-cluster-token k8s_etcd \
--initial-advertise-peer-urls https://node03:2380 \
--data-dir /var/lib/etcd
start containers and check cluster status
cd /etc/ssl/etcd/ssl
etcdctl \
--endpoints=https://node01:2379 \
--ca-file=./ca.pem \
--cert-file=./member-node01.pem \
--key-file=./member-node01-key.pem \
member list