Why clustering on k8s through redis-ha doesn't work?

4/25/2019

I'm trying to create Redis cluster along with Node.JS (ioredis/cluster) but that doesn't seem to work.

It's v1.11.8-gke.6 on GKE.

I'm doing exactly what been told in ha-redis docs:

 ~  helm install --set replicas=3 --name redis-test stable/redis-ha  
NAME:   redis-test
LAST DEPLOYED: Fri Apr 26 00:13:31 2019
NAMESPACE: yt
STATUS: DEPLOYED

RESOURCES:
==> v1/ConfigMap
NAME                           DATA  AGE
redis-test-redis-ha-configmap  3     0s
redis-test-redis-ha-probes     2     0s

==> v1/Pod(related)
NAME                          READY  STATUS    RESTARTS  AGE
redis-test-redis-ha-server-0  0/2    Init:0/1  0         0s

==> v1/Role
NAME                 AGE
redis-test-redis-ha  0s

==> v1/RoleBinding
NAME                 AGE
redis-test-redis-ha  0s

==> v1/Service
NAME                            TYPE       CLUSTER-IP   EXTERNAL-IP  PORT(S)             AGE
redis-test-redis-ha             ClusterIP  None         <none>       6379/TCP,26379/TCP  0s
redis-test-redis-ha-announce-0  ClusterIP  10.7.244.34  <none>       6379/TCP,26379/TCP  0s
redis-test-redis-ha-announce-1  ClusterIP  10.7.251.35  <none>       6379/TCP,26379/TCP  0s
redis-test-redis-ha-announce-2  ClusterIP  10.7.252.94  <none>       6379/TCP,26379/TCP  0s

==> v1/ServiceAccount
NAME                 SECRETS  AGE
redis-test-redis-ha  1        0s

==> v1/StatefulSet
NAME                        READY  AGE
redis-test-redis-ha-server  0/3    0s


NOTES:
Redis can be accessed via port 6379 and Sentinel can be accessed via port 26379 on the following DNS name from within your cluster:
redis-test-redis-ha.yt.svc.cluster.local

To connect to your Redis server:
1. Run a Redis pod that you can use as a client:

   kubectl exec -it redis-test-redis-ha-server-0 sh -n yt

2. Connect using the Redis CLI:

  redis-cli -h redis-test-redis-ha.yt.svc.cluster.local

 ~  k get pods | grep redis-test                                         
redis-test-redis-ha-server-0           2/2       Running   0          1m
redis-test-redis-ha-server-1           2/2       Running   0          1m
redis-test-redis-ha-server-2           2/2       Running   0          54s
 ~  kubectl exec -it redis-test-redis-ha-server-0 sh -n yt
Defaulting container name to redis.
Use 'kubectl describe pod/redis-test-redis-ha-server-0 -n yt' to see all of the containers in this pod.
/data $ redis-cli -h redis-test-redis-ha.yt.svc.cluster.local
redis-test-redis-ha.yt.svc.cluster.local:6379> set test key
(error) READONLY You can't write against a read only replica.

But in the end only one random pod I connect to is writable. I ran logs on a few containers and everything seem to be fine there. I tried to run cluster info in redis-cli but I get ERR This instance has cluster support disabled everywhere.

Logs:

 ~  k logs pod/redis-test-redis-ha-server-0  redis
1:C 25 Apr 2019 20:13:43.604 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 25 Apr 2019 20:13:43.604 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 25 Apr 2019 20:13:43.604 # Configuration loaded
1:M 25 Apr 2019 20:13:43.606 * Running mode=standalone, port=6379.
1:M 25 Apr 2019 20:13:43.606 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 25 Apr 2019 20:13:43.606 # Server initialized
1:M 25 Apr 2019 20:13:43.606 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 25 Apr 2019 20:13:43.627 * DB loaded from disk: 0.021 seconds
1:M 25 Apr 2019 20:13:43.627 * Ready to accept connections
1:M 25 Apr 2019 20:14:11.801 * Replica 10.7.251.35:6379 asks for synchronization
1:M 25 Apr 2019 20:14:11.801 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c2827ffe011d774db005a44165bac67a7e7f7d85', my replication IDs are '8311a1ca896e97d5487c07f2adfd7d4ef924f36b' and '0000000000000000000000000000000000000000')
1:M 25 Apr 2019 20:14:11.802 * Delay next BGSAVE for diskless SYNC
1:M 25 Apr 2019 20:14:17.825 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 25 Apr 2019 20:14:17.825 * Background RDB transfer started by pid 55
55:C 25 Apr 2019 20:14:17.826 * RDB: 0 MB of memory used by copy-on-write
1:M 25 Apr 2019 20:14:17.926 * Background RDB transfer terminated with success
1:M 25 Apr 2019 20:14:17.926 # Slave 10.7.251.35:6379 correctly received the streamed RDB file.
1:M 25 Apr 2019 20:14:17.926 * Streamed RDB transfer with replica 10.7.251.35:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
1:M 25 Apr 2019 20:14:18.828 * Synchronization with replica 10.7.251.35:6379 succeeded
1:M 25 Apr 2019 20:14:42.711 * Replica 10.7.252.94:6379 asks for synchronization
1:M 25 Apr 2019 20:14:42.711 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c2827ffe011d774db005a44165bac67a7e7f7d85', my replication IDs are 'af453adde824b2280ba66adb40cc765bf390e237' and '0000000000000000000000000000000000000000')
1:M 25 Apr 2019 20:14:42.711 * Delay next BGSAVE for diskless SYNC
1:M 25 Apr 2019 20:14:48.976 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 25 Apr 2019 20:14:48.977 * Background RDB transfer started by pid 125
125:C 25 Apr 2019 20:14:48.978 * RDB: 0 MB of memory used by copy-on-write
1:M 25 Apr 2019 20:14:49.077 * Background RDB transfer terminated with success
1:M 25 Apr 2019 20:14:49.077 # Slave 10.7.252.94:6379 correctly received the streamed RDB file.
1:M 25 Apr 2019 20:14:49.077 * Streamed RDB transfer with replica 10.7.252.94:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
1:M 25 Apr 2019 20:14:49.761 * Synchronization with replica 10.7.252.94:6379 succeeded
 ~  k logs pod/redis-test-redis-ha-server-1 redis 
1:C 25 Apr 2019 20:14:11.780 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 25 Apr 2019 20:14:11.781 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 25 Apr 2019 20:14:11.781 # Configuration loaded
1:S 25 Apr 2019 20:14:11.786 * Running mode=standalone, port=6379.
1:S 25 Apr 2019 20:14:11.791 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:S 25 Apr 2019 20:14:11.791 # Server initialized
1:S 25 Apr 2019 20:14:11.791 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:S 25 Apr 2019 20:14:11.792 * DB loaded from disk: 0.001 seconds
1:S 25 Apr 2019 20:14:11.792 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 25 Apr 2019 20:14:11.792 * Ready to accept connections
1:S 25 Apr 2019 20:14:11.792 * Connecting to MASTER 10.7.244.34:6379
1:S 25 Apr 2019 20:14:11.792 * MASTER <-> REPLICA sync started
1:S 25 Apr 2019 20:14:11.792 * Non blocking connect for SYNC fired the event.
1:S 25 Apr 2019 20:14:11.793 * Master replied to PING, replication can continue...
1:S 25 Apr 2019 20:14:11.799 * Trying a partial resynchronization (request c2827ffe011d774db005a44165bac67a7e7f7d85:6006176).
1:S 25 Apr 2019 20:14:17.824 * Full resync from master: af453adde824b2280ba66adb40cc765bf390e237:722
1:S 25 Apr 2019 20:14:17.824 * Discarding previously cached master state.
1:S 25 Apr 2019 20:14:17.852 * MASTER <-> REPLICA sync: receiving streamed RDB from master
1:S 25 Apr 2019 20:14:17.853 * MASTER <-> REPLICA sync: Flushing old data
1:S 25 Apr 2019 20:14:17.853 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 25 Apr 2019 20:14:17.853 * MASTER <-> REPLICA sync: Finished with success

What am I missing or is there a better way to do clustering?

-- blits
deployment
kubectl
kubernetes
kubernetes-helm
redis

1 Answer

4/28/2019

Not the best solution, but I figured I can just use Sentinel instead of finding another way (or maybe there is no another way). It has support on most languages so it shouldn't be very hard (except redis-cli, can't figure how to query Sentinel server).

This is how I got this done on ioredis (node.js, sorry if you not familiar with ES6 syntax):

import * as IORedis from 'ioredis';
import Redis from 'ioredis';
import { redisHost, redisPassword, redisPort } from './config';

export function getRedisConfig(): IORedis.RedisOptions {
  // I'm not sure how to set this properly
  // ioredis/cluster automatically resolves all pods by hostname, but not this.
  // So I have to implicitly specify all pods.
  // Or resolve them all by hostname
  return {
    sentinels: process.env.REDIS_CLUSTER.split(',').map(d => {
      const [host, port = 26379] = d.split(':');

      return { host, port: Number(port) };
    }),
    name: process.env.REDIS_MASTER_NAME || 'mymaster',
    ...(redisPassword ? { password: redisPassword } : {}),
  };
}

export async function initializeRedis() {
  if (process.env.REDIS_CLUSTER) {
    const cluster = new Redis(getRedisConfig());

    return cluster;
  }

  // For dev environment
  const client = new Redis(redisPort, redisHost);

  if (redisPassword) {
    await client.auth(redisPassword);
  }

  return client;
}

In env:

env:
  - name: REDIS_CLUSTER
    value: redis-redis-ha-server-1.redis-redis-ha.yt.svc.cluster.local:26379,redis-redis-ha-server-0.redis-redis-ha.yt.svc.cluster.local:23679,redis-redis-ha-server-2.redis-redis-ha.yt.svc.cluster.local:23679

You may wanna protect it using password.

-- blits
Source: StackOverflow