Rancher Etcd inner db cannot clean


have a Rancher installation using docker image. version v2.2.1.

Lately started to get logs “Failed to update lock: etcdserver: mvcc: database space exceeded”

checking etcd for the cluster we have everything looks ok.

etcd status

So noticed that etcd db inside rancher docker container is like this:

inside directoy /var/lib/rancher/management-state/etcd/member/snap

2.1G Jul 17 22:29 db

but cannot compact or interact with it.

Why Rancher docker image has a etcd db itself? is not enough having the cluster one?

and how can we keep it small in order to solve the problem?

Thanks in advance

-- user2509196

1 Answer


Same here with single node installation of rancher/rancher:stable (997af25b7b54). You can run etcdctl in a service container on the same docker host like your rancher:

docker run --net=container:<NAME_OF_RANCHER_CONTAINER> -id --name etcd-utility rancher/rke-tools:v0.1.40

And then, because you use the net from the rancher container, the localhost output here refers to the rancher container.

docker exec etcd-utility etcdctl member list
8e9e05c52164694d: name=default peerURLs=http://localhost:2380
clientURLs=http://localhost:2379 isLeader=true

Now, when attaching to the etcd-utils container, you can fix the issue with your etcd like this (no output pasted):

host# docker exec -it etcd-utility bash

bash-4.4# export ETCDCTL_API=3
bash-4.4# etcdctl endpoint status --endpoints=$(etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') --write-out table
bash-4.4# etcdctl compact `etcdctl endpoint status --write-out json | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*'`
bash-4.4# etcdctl defrag  `etcdctl endpoint status --write-out json | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*'`
bash-4.4# etcdctl alarm list
bash-4.4# etcdctl alarm disarm

The latter follows the etcd trouble shooting guide for cluster etcd, which is explained in detail in the rancher docs etcd-space-errors. For single node, see links in this comment

-- Schuh
