Clients get a RemoteDisconnected error at 60s from my flask/gunicorn/nginx-ingress stack. Why? Timeouts set...nothing

12/3/2019

I've got a Python flask app behind gunicorn behind nginx-ingress and I'm honestly just running out of ideas. What happens after a long-running computation is that there's a RemoteDisconnect error at 60 seconds, with nothing untoward in the logs. Gunicorn is set to have a massive timeout, so it's not that. Nginx is quite happy to terminate at 60 sec without any error:

xxx.xx.xx.xx - [xxx.xx.xx.xx] - - [03/Dec/2019:19:32:08 +0000] "POST /my/url" 499 0 "-" "python-requests/2.22.0" 1516 59.087 [my-k8s-service] [] xxx.xx.xx.xx:port 0 59.088 - c676c3df9a40c1692b1789e677a27268

No error, warning, nothing. Since 60s was so suspect, I figured it was proxy-read-timeout or upstream-keepalive-timeout... nothing; I've set those both in a configmap and in the .yaml files using annotations, and exec'ing into the pod for a cat /etc/nginx/nginx.conf shows the requisite server has the test values in place:

    proxy_connect_timeout                   72s;
    proxy_send_timeout                      78s;
    proxy_read_timeout                      75s;

...funny values set to better identify the result. And yet... still disconnects at 60 sec.

The "right" answer, which we're doing, is to rewrite the thing to have asynchronous calls, but it's really bothering me that I don't know how to fix this. Am I setting the wrong thing? In the background, the Flask app continues running and completes after several minutes, but Nginx just says the POST dies after a minute. I'm totally baffled. I've got an Nginx 499 error which means a client disconnect.

We're on AWS, so I even tried adding the annotation

Annotations:       service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: 93

...to the service just in case (from https://kubernetes.io/docs/concepts/services-networking/service/). No dice: still dies at 60s.

What am I missing?

-- vputz
flask
kops
kubernetes
nginx

1 Answer

3/11/2020
Seems no issue on AWS load balancer side. its NGINX GUNICORN connection issue . you need to use update proxy timeout value . try annotations in ingress rules to fix . 

nginx.ingress.kubernetes.io/proxy-connect-timeout = 300s
nginx.ingress.kubernetes.io/proxy-send-timeout = 300s
nginx.ingress.kubernetes.io/proxy-read-timeout =  300s

if you are using GUNICORN also set --timeout = 300

-- ANISH KUMAR MOURYA
Source: StackOverflow