raft: tocommit(331448) is out of range. Was the raft log corrupted, truncated, or lost?

10/11/2018

I currently have a multimaster cluster deployment of Kubernetes on AWS, realized by kops. I have made a snapshot of the master EBS. I have tried to kill the master and I have attached to it a volume created by the snapshot itself.

I have this output on the master node in the etcd-server-events.log:

2018-10-11 10:51:29.065103 I | etcdserver: restarting member abb6cb1e71aa378a in cluster 497962ab73477c7c at commit index 325740
2018-10-11 10:51:29.065583 I | raft: abb6cb1e71aa378a became follower at term 377
2018-10-11 10:51:29.065650 I | raft: newRaft abb6cb1e71aa378a [peers: [1a8985a0df7217f,755109cbc9ac97c4,abb6cb1e71aa378a], term: 377, commit: 325740, ap                               plied: 320032, lastindex: 325740, lastterm: 377]
2018-10-11 10:51:29.072556 I | etcdserver: starting server... [version: 2.2.1, cluster version: 2.2]
2018-10-11 10:51:29.097569 I | rafthttp: the connection with 755109cbc9ac97c4 became active
2018-10-11 10:51:29.097853 I | rafthttp: the connection with 1a8985a0df7217f became active
2018-10-11 10:51:29.122129 C | raft: tocommit(331448) is out of range [lastIndex(325740)]. Was the raft log corrupted, truncated, or lost?
panic: tocommit(331448) is out of range [lastIndex(325740)]. Was the raft log corrupted, truncated, or lost?

goroutine 40 [running]:
github.com/coreos/etcd/Godeps/_workspace/src/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc8200f3fc0, 0xe93e40, 0x5d, 0xc820a1e900, 0x2, 0x2                               )
        /home/vagrant/gopath/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/Godeps/_workspace/src/github.com/coreos/pkg/capns                               log/pkg_logger.go:73 +0x191
github.com/coreos/etcd/raft.(*raftLog).commitTo(0xc8204435e0, 0x50eb8)
        /home/vagrant/gopath/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/raft/log.go:184 +0x1a6
github.com/coreos/etcd/raft.(*raft).handleHeartbeat(0xc8200c2680, 0x8, 0xabb6cb1e71aa378a, 0x1a8985a0df7217f, 0x179, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/vagrant/gopath/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/raft/raft.go:694 +0x44
github.com/coreos/etcd/raft.stepFollower(0xc8200c2680, 0x8, 0xabb6cb1e71aa378a, 0x1a8985a0df7217f, 0x179, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/vagrant/gopath/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/raft/raft.go:659 +0x11b2
github.com/coreos/etcd/raft.(*raft).Step(0xc8200c2680, 0x8, 0xabb6cb1e71aa378a, 0x1a8985a0df7217f, 0x179, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/vagrant/gopath/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/raft/raft.go:508 +0x2a8
github.com/coreos/etcd/raft.(*node).run(0xc820995040, 0xc8200c2680)
        /home/vagrant/gopath/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/raft/node.go:285 +0x8eb
created by github.com/coreos/etcd/raft.RestartNode
        /home/vagrant/gopath/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/raft/node.go:190 +0x2e3

The etcd-server-events pod on that master node is in CrashLoopBackOff status. However, all the nodes are ready and running. Suggestions? My Etcd it isn't working well.. How could I resolve it?

-- falberto89
amazon-web-services
disaster-recovery
etcd
kops
kubernetes

0 Answers