have a Rancher installation using docker image. version v2.2.1.
Lately started to get logs “Failed to update lock: etcdserver: mvcc: database space exceeded”
checking etcd for the cluster we have everything looks ok.
So noticed that etcd db inside rancher docker container is like this:
inside directoy /var/lib/rancher/management-state/etcd/member/snap
2.1G Jul 17 22:29 db
but cannot compact or interact with it.
Why Rancher docker image has a etcd db itself? is not enough having the cluster one?
and how can we keep it small in order to solve the problem?
Thanks in advance
Same here with single node installation of rancher/rancher:stable (997af25b7b54). You can run etcdctl in a service container on the same docker host like your rancher:
docker run --net=container:<NAME_OF_RANCHER_CONTAINER> -id --name etcd-utility rancher/rke-tools:v0.1.40
And then, because you use the net from the rancher container, the localhost output here refers to the rancher container.
docker exec etcd-utility etcdctl member list
8e9e05c52164694d: name=default peerURLs=http://localhost:2380
clientURLs=http://localhost:2379 isLeader=true
Now, when attaching to the etcd-utils container, you can fix the issue with your etcd like this (no output pasted):
host# docker exec -it etcd-utility bash
bash-4.4# export ETCDCTL_API=3
bash-4.4# etcdctl endpoint status --endpoints=$(etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --write-out table
bash-4.4# etcdctl compact `etcdctl endpoint status --write-out json | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*'`
bash-4.4# etcdctl defrag `etcdctl endpoint status --write-out json | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*'`
bash-4.4# etcdctl alarm list
bash-4.4# etcdctl alarm disarm
The latter follows the etcd trouble shooting guide for cluster etcd, which is explained in detail in the rancher docs etcd-space-errors. For single node, see links in this comment