I have two nameko services that communicate using RPC via RabbitMQ. Locally with docker-compose all works fine. Then I deployed everything to Kubernetes/Istio cluster on DigitalOcean and started get the following errors. It repeats continuously 1 time in 10/20/60 minutes. Communication between services works fine (before and after recconect I suppose) but logs are messy with those unexpected reconnections that should not happen.
Helm RabbitMQ configuration file
I tried to increase RAM and CPU configuration (to the values in the configuration files above: 512Mb and 400m) but still have the same behavior.
NB: I don't touch services after deployment, no messages being sent or any requests made and I have this error for the first time in around 60 minutes. When I make requests they succeed but eventually we still have this errors in logs afterwards.
Nameko service log:
"Connection to broker lost, trying to re-establish connection...",
"exc_info": "Traceback (most recent call last):
File \"/usr/local/lib/python3.6/site-packages/kombu/mixins.py\", line 175, in run for _ in self.consume(limit=None, **kwargs):
File \"/usr/local/lib/python3.6/site-packages/kombu/mixins.py\", line 197, in consume conn.drain_events(timeout=safety_interval)
File \"/usr/local/lib/python3.6/site-packages/kombu/connection.py\", line 323, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File \"/usr/local/lib/python3.6/site-packages/kombu/transport/pyamqp.py\", line 103, in drain_events
return connection.drain_events(**kwargs)
File \"/usr/local/lib/python3.6/site-packages/amqp/connection.py\", line 505, in drain_events
while not self.blocking_read(timeout):
File \"/usr/local/lib/python3.6/site-packages/amqp/connection.py\", line 510, in blocking_read\n frame = self.transport.read_frame()
File \"/usr/local/lib/python3.6/site-packages/amqp/transport.py\", line 252, in read_frame
frame_header = read(7, True)
File \"/usr/local/lib/python3.6/site-packages/amqp/transport.py\", line 446, in _read
raise IOError('Server unexpectedly closed connection')
OSError: Server unexpectedly closed connection"}
{"name": "kombu.mixins", "asctime": "29/12/2019 20:22:54", "levelname": "INFO", "message": "Connected to amqp://user:**@rabbit-rabbitmq:5672//"}
RabbitMQ log
2019-12-29 20:22:54.563 [warning] <0.718.0> closing AMQP connection <0.718.0> (127.0.0.1:46504 -> 127.0.0.1:5672, vhost: '/', user: 'user'):
client unexpectedly closed TCP connection
2019-12-29 20:22:54.563 [warning] <0.705.0> closing AMQP connection <0.705.0> (127.0.0.1:46502 -> 127.0.0.1:5672, vhost: '/', user: 'user'):
client unexpectedly closed TCP connection
2019-12-29 20:22:54.681 [info] <0.3424.0> accepting AMQP connection <0.3424.0> (127.0.0.1:43466 -> 127.0.0.1:5672)
2019-12-29 20:22:54.689 [info] <0.3424.0> connection <0.3424.0> (127.0.0.1:43466 -> 127.0.0.1:5672): user 'user' authenticated and granted access to vhost '/'
2019-12-29 20:22:54.690 [info] <0.3431.0> accepting AMQP connection <0.3431.0> (127.0.0.1:43468 -> 127.0.0.1:5672)
2019-12-29 20:22:54.696 [info] <0.3431.0> connection <0.3431.0> (127.0.0.1:43468 -> 127.0.0.1:5672): user 'user' authenticated and granted access to vhost '/'
UPD:
I think that is related to this
try to install netstat
utility and run it to see if you have too many connections other than ESTABLISHED
and try to add those in your settings:
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time=30
net.ipv4.tcp_keepalive_intvl=10
net.ipv4.tcp_keepalive_probes=4
net.ipv4.tcp_tw_reuse = 1
see this
Have you tried to increase the heartbeat of the connection? It is likely that your connection gets terminated on lower level due inactivity.
Also make sure that you have enough resources to run all containers on the host machine.
I had similar issue and I am not sure which one of the following solved it for me:
Hope this gets you somewhere.
Issue is with istio proxy getting injected as sidecar container inside rabbitmq pod. You need to exclude istio proxy from rabbitmq then it should work.