Accessing AWS ElastiCache (Redis CLUSTER mode) from different AWS accounts via AWS PrivateLink

7/2/2021

I have a business case where I want to access a clustered Redis cache from one account (let's say account A) to an account B.

I have used the solution mentioned in the below link and for the most part, it works Base Solution

The base solution works fine if I am trying to access the clustered Redis via redis-py however if I try to use it with redis-py-cluster it fails.

I am testing all this in a staging environment where the Redis cluster has only one node but in the production environment, it has two nodes, so the redis-py approach will not work for me.

Below is my sample code

redis = "3.5.3"
redis-py-cluster = "2.1.3"
==============================


from redis import Redis
from rediscluster import RedisCluster

respCluster = 'error'
respRegular = 'error'

host = "vpce-XXX.us-east-1.vpce.amazonaws.com"
port = "6379"

try:
    ru = RedisCluster(startup_nodes=[{"host": host, "port": port}], decode_responses=True, skip_full_coverage_check=True)
    respCluster = ru.get('ABC')
except Exception as e:
    print(e)

try:
    ru = Redis(host=host, port=port, decode_responses=True)
    respRegular = ru.get('ABC')
except Exception as e:
    print(e)

return {"respCluster": respCluster, "respRegular": respRegular}

The above code works perfectly in account A but in account B the output that I got was

{'respCluster': 'error', 'respRegular': '123456789'}

And the error that I am getting is

rediscluster.exceptions.ClusterError: TTL exhausted

In account A we are using AWS ECS + EC2 + docker to run this and

In account B we are running the code in an AWS EKS Kubernetes pod.

What should I do to make the redis-py-cluster work in this case? or is there an alternative to redis-py-cluster in python to access a multinode Redis cluster?

I know this is a highly specific case, any help is appreciated.

EDIT 1: Upon further research, it seems that TTL exhaust is a general error, in the logs the initial error is

redis.exceptions.ConnectionError: 
Error 101 connecting to XX.XXX.XX.XXX:6379. Network is unreachable

Here the XXXX is the IP of the Redus cluster in Account A. This is strange since the redis-py also connects to the same IP and port, this error should not exist.

-- Abhishek Patil
amazon-web-services
docker
kubernetes
python
redis

2 Answers

7/11/2021

So turns out the issue was due to how redis-py-cluster manages host and port.

When a new redis-py-cluster object is created it gets a list of host IPs from the Redis server(i.e. Redis cluster host IPs form account A), after which the client tries to connect to the new host and ports.

In normal cases, it works as the initial host and the IP from the response are one and the same.(i.e. the host and port added at the time of object creation)

In our case, the object creation host and port are obtained from the DNS name from the Endpoint service of Account B.

It leads to the code trying to access the actual IP from account A instead of the DNS name from account B.

The issue was resolved using Host port remapping, here we bound the IP returned from the Redis server from Account A with IP Of Account B's endpoints services DNA name.

-- Abhishek Patil
Source: StackOverflow

7/8/2021

Based on your comment:

this was not possible because of VPCs in Account-A and Account-B had the same CIDR range. Peered VPCs can’t have the same CIDR range.

I think what you are looking for is impossible. Routing within a VPC always happens first - it happens before any route tables are considered at all. Said another way, if the destination of the packet lies within the sending VPC it will never leave that VPC because AWS will try routing it within its own VPC, even if the IP isn't in use at that time in the VPC.

So, if you are trying to communicate with a another VPC which has the same IP range as yours, even if you specifically put a route to egress traffic to a different IP (but in the same range), the rule will be silently ignored and AWS will try to deliver the packet in the originating VPC, which seems like it is not what you are trying to accomplish.

-- Foghorn
Source: StackOverflow