Long-lived connections within docker container intermittently dies

8/4/2017

I have a docker image running inside kubernetes with a Python application that uses a long-lived connection to MySQL. The connection will die due to the underlying socket losing connection to the external host after seemingly random periods. The typical duration is between 10 and 30 minutes. I have tested this docker container both locally and elsewhere in my production environment (outside kubernetes) without encountering any connection errors.

Here is the docker version and uname output for the image running:

$ docker --version
Docker version 1.12.6, build 78d1802

$ uname -a
Linux c1b1f31a4048 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

The following is the uname output for the host:

Linux ip-10-2-110-119 4.4.41-k8s #1 SMP Mon Jan 9 15:34:39 UTC 2017 x86_64 GNU/Linux

I have seen discussion where people have had issues with long-lived connections dying due to other containers starting and stopping, ultimately causing network loss across all containers on the host. I have attempted to reproduce this scenario outside of kubernetes by manually starting and stopping other containers but was unable to reproduce the connection failure.

I had a theory that our NAT was snapping connections due to the long tcp_keepalive_timeout on the host (7200 seconds by default). I have decreased that drastically to ensure that TCP keepalive packets are sent when the connection is idle but this had no effect. I have actually witnessed connection loss in the middle of streaming many rows from MySQL.

Is there specific network configuration that should be used to ensure that long lived connections do not die within this environment?

-- ccapurso
docker
kubernetes
networking
sockets

0 Answers