etcd files in /var/lib/etcd/member/snap

6/29/2018

Our kubernetes cluster recently crashed because of etcd "database size exceeded".

We succeeded to put everything back on with a "simple" etcd cluster endpoints defrag (see here).

Unfortunately everything is not perfect yet. Especially the /var/lib/etcd/member/snap directory of an etcd endpoint:

total 25G
-rw-r--r--. 1 etcd root   21K Jun 29 20:27 00000000000810cb-00000000066072bc.snap
-rw-r--r--. 1 etcd root   21K Jun 29 20:40 00000000000810cb-00000000066099cd.snap
-rw-r--r--. 1 etcd root   21K Jun 29 20:55 00000000000810df-000000000660c0de.snap
-rw-r--r--. 1 etcd root   21K Jun 29 21:19 000000000008113f-000000000660e7ef.snap
-rw-r--r--. 1 etcd root   21K Jun 29 21:37 0000000000081162-0000000006610f00.snap
-rw-------. 1 etcd root  916M Jun 29 15:40 000000000619e354.snap.db
-rw-------. 1 etcd root  916M Jun 29 15:41 00000000061b9704.snap.db
-rw-------. 1 etcd root  916M Jun 29 15:43 00000000061ca269.snap.db
-rw-------. 1 etcd root  916M Jun 29 15:44 00000000061dbb43.snap.db
-rw-------. 1 etcd root  916M Jun 29 15:47 00000000061e40df.snap.db
-rw-------. 1 etcd root  916M Jun 29 15:48 00000000061e8192.snap.db
-rw-------. 1 etcd root  916M Jun 29 15:49 00000000061f8799.snap.db
-rw-------. 1 etcd root  916M Jun 29 15:49 0000000006200018.snap.db
-rw-------. 1 etcd root  916M Jun 29 15:52 0000000006225cfd.snap.db
-rw-------. 1 etcd root  916M Jun 29 15:53 00000000062323d6.snap.db
-rw-------. 1 etcd root  916M Jun 29 15:53 00000000062396fa.snap.db
-rw-------. 1 etcd root  970M Jun 29 15:54 000000000624dfe7.snap.db
-rw-------. 1 etcd root 1003M Jun 29 15:54 0000000006259f3f.snap.db
-rw-------. 1 etcd root  1.2G Jun 29 15:58 0000000006296ff0.snap.db
-rw-------. 1 etcd root  1.2G Jun 29 15:59 000000000629b9bc.snap.db
-rw-------. 1 etcd root  1.2G Jun 29 16:01 00000000062b02c0.snap.db
-rw-------. 1 etcd root  1.2G Jun 29 16:02 00000000062bef04.snap.db
-rw-------. 1 etcd root  1.2G Jun 29 16:05 00000000062db8e2.snap.db
-rw-------. 1 etcd root  1.2G Jun 29 16:09 00000000062ef4c4.snap.db
-rw-------. 1 etcd root  1.2G Jun 29 16:11 00000000062fce7a.snap.db
-rw-------. 1 etcd root  1.2G Jun 29 16:12 00000000063063c1.snap.db
-rw-------. 1 etcd root  1.2G Jun 29 16:12 000000000630b648.snap.db
-rw-------. 1 etcd root  1.2G Jun 29 16:12 000000000630bdbb.snap.db
-rw-------. 1 etcd root  1.2G Jun 29 16:13 000000000630e3f1.snap.db
-rw-------. 1 etcd root   27M Jun 29 21:41 db

This is the only endpoint like this and on this endpoint we are the low disk.

No documentation exists about those files and especially how to reduce the overall disk footprint in our case (defrag / compaction / etc has been tried)).

What are those files? How to reduce the disk footprint of this endpoint (get rid of those huge snap.db files) ?

-- Emmenemoi
etcd
kubernetes
snapshot

1 Answer

6/30/2018

They seem to be snapshots of a given state of your etcd cluster over time.

Sounds like they can be rotated. At least according to this:

etcdserver: purge old snap.db files #7967 https://github.com/coreos/etcd/pull/7967

Hope this answer is helpful somehow.

-- the_marcelo_r
Source: StackOverflow