Nameko/RabbitMQ: OSError: Server unexpectedly closed connection

12/29/2019

I have two nameko services that communicate using RPC via RabbitMQ. Locally with docker-compose all works fine. Then I deployed everything to Kubernetes/Istio cluster on DigitalOcean and started get the following errors. It repeats continuously 1 time in 10/20/60 minutes. Communication between services works fine (before and after recconect I suppose) but logs are messy with those unexpected reconnections that should not happen.

Helm RabbitMQ configuration file

I tried to increase RAM and CPU configuration (to the values in the configuration files above: 512Mb and 400m) but still have the same behavior.

NB: I don't touch services after deployment, no messages being sent or any requests made and I have this error for the first time in around 60 minutes. When I make requests they succeed but eventually we still have this errors in logs afterwards.

Nameko service log:

"Connection to broker lost, trying to re-establish connection...",
"exc_info": "Traceback (most recent call last):
File \"/usr/local/lib/python3.6/site-packages/kombu/mixins.py\", line 175, in run for _ in self.consume(limit=None, **kwargs):
File \"/usr/local/lib/python3.6/site-packages/kombu/mixins.py\", line 197, in consume   conn.drain_events(timeout=safety_interval)
File \"/usr/local/lib/python3.6/site-packages/kombu/connection.py\", line 323, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File \"/usr/local/lib/python3.6/site-packages/kombu/transport/pyamqp.py\", line 103, in drain_events
return connection.drain_events(**kwargs)
File \"/usr/local/lib/python3.6/site-packages/amqp/connection.py\", line 505, in drain_events
while not self.blocking_read(timeout):
File \"/usr/local/lib/python3.6/site-packages/amqp/connection.py\", line 510, in blocking_read\n    frame = self.transport.read_frame()
File \"/usr/local/lib/python3.6/site-packages/amqp/transport.py\", line 252, in read_frame
frame_header = read(7, True)
File \"/usr/local/lib/python3.6/site-packages/amqp/transport.py\", line 446, in _read
raise IOError('Server unexpectedly closed connection')
OSError: Server unexpectedly closed connection"}
{"name": "kombu.mixins", "asctime": "29/12/2019 20:22:54", "levelname": "INFO", "message": "Connected to amqp://user:**@rabbit-rabbitmq:5672//"}

RabbitMQ log

2019-12-29 20:22:54.563 [warning] <0.718.0> closing AMQP connection <0.718.0> (127.0.0.1:46504 -> 127.0.0.1:5672, vhost: '/', user: 'user'):
client unexpectedly closed TCP connection
2019-12-29 20:22:54.563 [warning] <0.705.0> closing AMQP connection <0.705.0> (127.0.0.1:46502 -> 127.0.0.1:5672, vhost: '/', user: 'user'):
client unexpectedly closed TCP connection
2019-12-29 20:22:54.681 [info] <0.3424.0> accepting AMQP connection <0.3424.0> (127.0.0.1:43466 -> 127.0.0.1:5672)
2019-12-29 20:22:54.689 [info] <0.3424.0> connection <0.3424.0> (127.0.0.1:43466 -> 127.0.0.1:5672): user 'user' authenticated and granted access to vhost '/'
2019-12-29 20:22:54.690 [info] <0.3431.0> accepting AMQP connection <0.3431.0> (127.0.0.1:43468 -> 127.0.0.1:5672)
2019-12-29 20:22:54.696 [info] <0.3431.0> connection <0.3431.0> (127.0.0.1:43468 -> 127.0.0.1:5672): user 'user' authenticated and granted access to vhost '/'

UPD:

Rabbit pod yaml

-- Max
docker
kubernetes
nameko
python
rabbitmq

3 Answers

1/2/2020

I think that is related to this

try to install netstat utility and run it to see if you have too many connections other than ESTABLISHED

and try to add those in your settings:

net.ipv4.tcp_fin_timeout = 30

net.ipv4.tcp_keepalive_time=30
net.ipv4.tcp_keepalive_intvl=10
net.ipv4.tcp_keepalive_probes=4

net.ipv4.tcp_tw_reuse = 1

see this

-- LinPy
Source: StackOverflow

1/8/2020

Have you tried to increase the heartbeat of the connection? It is likely that your connection gets terminated on lower level due inactivity.

Also make sure that you have enough resources to run all containers on the host machine.

I had similar issue and I am not sure which one of the following solved it for me:

  1. Proper resource management
  2. Making an entry point in the DockerFile of a bash script that runs the file with the code that is supposed to be executed on infinite loop. (I know that one solved the memory leaks - bash script executed the file with your code, your code listens for message, gets a message and executes, exit the code, bash script loads it again....). I had my workers restarting after each message (the whole worker exits and new one is started - bad idea).

Hope this gets you somewhere.

-- user126587
Source: StackOverflow

1/7/2020

Issue is with istio proxy getting injected as sidecar container inside rabbitmq pod. You need to exclude istio proxy from rabbitmq then it should work.

-- P Ekambaram
Source: StackOverflow