I am trying to create a simple redis high availability setup with 1 master, 1 slave and 2 sentinels.
The setup works perfectly when failing over from redis-master
to redis-slave
. When redis-master
recovers, it correctly register itself as slave to the new redis-slave
master.
However, when redis-slave
as a master goes down, redis-master
cannot return as master. The log of redis-master
go into the loop showing:
1:S 12 Dec 11:12:35.073 * MASTER <-> SLAVE sync started
1:S 12 Dec 11:12:35.073 * Non blocking connect for SYNC fired the event.
1:S 12 Dec 11:12:35.074 * Master replied to PING, replication can continue...
1:S 12 Dec 11:12:35.075 * Trying a partial resynchronization (request 684581a36d134a6d50f1cea32820004a5ccf3b2d:285273).
1:S 12 Dec 11:12:35.076 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 12 Dec 11:12:36.081 * Connecting to MASTER 10.102.1.92:6379
1:S 12 Dec 11:12:36.081 * MASTER <-> SLAVE sync started
1:S 12 Dec 11:12:36.082 * Non blocking connect for SYNC fired the event.
1:S 12 Dec 11:12:36.082 * Master replied to PING, replication can continue...
1:S 12 Dec 11:12:36.083 * Trying a partial resynchronization (request 684581a36d134a6d50f1cea32820004a5ccf3b2d:285273).
1:S 12 Dec 11:12:36.084 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 12 Dec 11:12:37.087 * Connecting to MASTER 10.102.1.92:6379
1:S 12 Dec 11:12:37.088 * MASTER <-> SLAVE sync started
...
Per Replication doc, it states that:
Since Redis 4.0, when an instance is promoted to master after a failover, it will be still able to perform a partial resynchronization with the slaves of the old master.
But the log seems to show otherwise. More detail version of log showing both the first redis-master
to redis-slave
failover and subsequent redis-slave
to redis-master
log is available here.
Any idea what's going on? What do I have to do to allow the redis-master
to return to master role? Configuration detail is provided below:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
redis-master ClusterIP 10.102.1.92 <none> 6379/TCP 11m
redis-slave ClusterIP 10.107.0.73 <none> 6379/TCP 11m
redis-sentinel ClusterIP 10.110.128.95 <none> 26379/TCP 11m
requirepass test1234
masterauth test1234
dir /data
tcp-keepalive 60
maxmemory-policy noeviction
appendonly no
bind 0.0.0.0
save 900 1
save 300 10
save 60 10000
slave-announce-ip redis-master.fp8-cache
slave-announce-port 6379
requirepass test1234
slaveof redis-master.fp8-cache 6379
masterauth test1234
dir /data
tcp-keepalive 60
maxmemory-policy noeviction
appendonly no
bind 0.0.0.0
save 900 1
save 300 10
save 60 10000
slave-announce-ip redis-slave.fp8-cache
slave-announce-port 6379
It turn out that the problem is related to the used of host name instead of IP:
slaveof redis-master.fp8-cache 6379
...
slave-announce-ip redis-slave.fp8-cache
So, when the master came back as slave, sentinel shows that there are now 2 slaves: one with ip address and another with host name. Not sure exactly how does these 2 slave entries (that points to the same Redis server) cause the problem above. Now that I changed the config to use IP address instead of host name the Redis HA is working flawlessly.