How does Consul recover from losing quorum with changing node IPs?

2/27/2019

I deployed Consul server using the Helm chart giving me a three node cluster. I could view the IP addresses and IDs of the nodes:

$ consul catalog nodes
Node             ID        Address     DC
consul-server-0  065ab1e4  10.60.1.11  dc1
consul-server-1  46eca681  10.60.0.16  dc1
consul-server-2  fb5fa37d  10.60.2.8   dc1

As a test I force deleted all three of these nodes as follows:

kubectl delete pods -n consul --force --grace-period=0 consul-server-0 consul-server-1 consul-server-2

Three new pods came up with different IPs but the same IDs, joined the cluster and achieved consensus again:

$ consul catalog nodes
Node             ID        Address     DC
consul-server-0  065ab1e4  10.60.1.12  dc1
consul-server-1  46eca681  10.60.2.9   dc1
consul-server-2  fb5fa37d  10.60.0.17  dc1

What does Consul rely on to recover from this situation? Can it form quorum again since the IDs are the same and then work out between them that the IPs have changed? Or is the names of the nodes staying consistent also a requirement for automatic recovery?

I see log messages such as:

consul: removed server with duplicate ID: 46eca681-b5d6-21e7-3df5-cf228ffdd02c

So it seems the changing IP address is causing a new node to be added to the cluster but then Consul works out that it needs to be removed. Because of this I would expect there to be 6 nodes at one point with 3 unavailable causing the cluster to lose quorum and not be able to recover automatically, but this does not happen.

-- dippynark
consul
kubernetes

1 Answer

3/1/2019

We also run consul in docker swarm and recovery after failure is not a trivial problem. Because failed server recreate in a new container, obviously, with different IP. Consul spring a lot of errors and messages about raft. But I did not see a serious problem with it. I just filter this kind of logs and not translate to long live indexes in elasticsearch.

We use the next config for a faster server recovery:

{
  "skip_leave_on_interrupt" : true,
  "leave_on_terminate" : true,
  "disable_update_check": true,
  "autopilot" : {
    "cleanup_dead_servers": true,
    "last_contact_threshold": "1s"
  }
}

You can review parameters here

-- ozlevka
Source: StackOverflow