Pod deletion for StatefulSet of redis cluster doesn't restore the cluster-state of redis

6/14/2019

I have used redis:5.0.1-alpine in my statefulset, the stateful set has 6 pods and the redis cluster formation is done using below command

redis-cli --cluster create {IPlist is placed here} --cluster-replicas 1

Now in case the pods get accidentally deleted or the AKS gets out of service, then the pods when create after AKS resumes will have different IP.

I tried by deliberately deleting the pods, when the pods get recreated then the cluster state changes to "fail" ( which was "ok" when the cluster was initially created)

Also when I try to get the old data set into cluster, a message appears telling that "cluster is down"

I have displayed the code for redis.conf file used for cluster creation

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-cluster
  namespace: redis
data:
  update-node.sh: |
    #!/bin/sh
    REDIS_NODES="/data/nodes.conf"
    sed -i -e "/myself/ s/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0- 
    9]\{1,3\}/${POD_IP}/" ${REDIS_NODES}
    exec "$@"
  redis.conf: |+
    cluster-enabled yes
    cluster-require-full-coverage no
    cluster-node-timeout 15000
    cluster-config-file /data/nodes.conf
    cluster-migration-barrier 1
    appendonly yes
    protected-mode no

issue description snapshot

Redis Cluster Nodes and Slots related data is as attached redis cluster nodes and slots

-- Tushar Mahajan
azure-kubernetes
kubernetes-statefulset
redis-cluster

1 Answer

7/23/2019

When you restart single pod, the pod goes up with a new IP, publish it to other pods and they all update their configuration regarding the IP change.

In case of all pods goes down and up on the same time (for example in case all nodes in the cluster are rebooted), the pods cannot talk with each other since the IPs in their nodes.conf is wrong.

A possible solution is to update the IPs in nodes.conf on all running pods and restart them one-by-one.

I did it by implanting this script in each pod:

recover-pod.sh

#!/bin/sh
set -e

REDIS_NODES_FILE="/data/nodes.conf"
for redis_node_ip in "$@"
do
  redis_node_id=`redis-cli -h $redis_node_ip -p 6379 cluster nodes | grep myself | awk '{print $1}'`
  sed -i.bak -e "/^$redis_node_id/ s/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/${redis_node_ip}/" ${REDIS_NODES_FILE}
done

And running it from one of the Kubernetes nodes:

recover-cluster.sh

#!/bin/bash

for i in {0..5}
do
  echo "Updating the correct IPs in nodes.conf on redis-cluster-redis-cluster-statefulset-$i"
  kubectl exec -it redis-cluster-redis-cluster-statefulset-$i /readonly-config/recover-pod.sh $(kubectl get pods -l app=redis-cluster -o jsonpath='{range.items[*]}{.status.podIP} ' )
done

kubectl patch statefulset redis-cluster-redis-cluster-statefulset --patch '{"spec": {"template": {"metadata": {"labels": {"date": "'`date +%s`'" }}}}}'

This causes Redis cluster to return to healthy status.

-- Elad Tamary
Source: StackOverflow