nginx ingress controller worker shutdown takes hours

5/9/2018

I'm running a pretty standard nginx ingress controller with an empty configmap. The nginx config gets reloaded (not restarted) every minute or so and I'm seeing workers pile up in the shutting down state. Here is the output of ps aux

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   4240   796 ?        Ss   14:26   0:00 /usr/bin/dumb-init /nginx-ingress-controller --default-backend-service=kube-system/default-http-backend --configmap=kube-system/nginx-ingre
root         8  3.3  0.6 102224 96204 ?        Ssl  14:26   8:15 /nginx-ingress-controller --default-backend-service=kube-system/default-http-backend --configmap=kube-system/nginx-ingress-lb-conf
root        21  1.4  0.9 278328 153220 ?       S    14:26   3:38 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nobody     409  0.1  0.9 548800 150696 ?       Sl   18:29   0:00 nginx: worker process is shutting down
nobody     410  0.1  0.9 548800 150760 ?       Sl   18:29   0:00 nginx: worker process is shutting down
nobody     552  0.1  0.9 548800 150752 ?       Sl   18:29   0:00 nginx: worker process is shutting down
nobody     695  0.3  0.9 548800 150808 ?       Sl   18:29   0:00 nginx: worker process is shutting down
nobody     696  0.4  0.9 548800 150760 ?       Sl   18:29   0:00 nginx: worker process is shutting down
nobody     697  0.4  0.9 548800 150864 ?       Sl   18:29   0:00 nginx: worker process is shutting down
nobody     836  0.3  0.9 548800 150696 ?       Sl   18:30   0:00 nginx: worker process is shutting down
nobody     837  0.3  0.9 548800 150680 ?       Sl   18:30   0:00 nginx: worker process is shutting down
nobody     838  0.3  0.9 548800 150648 ?       Sl   18:30   0:00 nginx: worker process is shutting down
nobody     839  0.3  0.9 548800 150652 ?       Sl   18:30   0:00 nginx: worker process is shutting down
nobody     983  0.7  0.9 548800 150732 ?       Sl   18:30   0:00 nginx: worker process is shutting down
nobody     984  0.9  0.9 548800 150816 ?       Sl   18:30   0:00 nginx: worker process is shutting down
nobody     985  0.8  0.9 548800 150680 ?       Sl   18:30   0:00 nginx: worker process is shutting down
nobody     986  0.8  0.9 548800 150784 ?       Sl   18:30   0:00 nginx: worker process is shutting down
nobody    1120  1.0  0.9 548800 149592 ?       Sl   18:31   0:00 nginx: worker process
nobody    1121  1.0  0.9 548800 150584 ?       Sl   18:31   0:00 nginx: worker process
nobody    1122  0.5  0.9 548800 149364 ?       Sl   18:31   0:00 nginx: worker process
nobody    1123  0.7  0.9 548800 150588 ?       Sl   18:31   0:00 nginx: worker process
root      1262 29.0  0.4 169008 64084 ?        S    18:31   0:00 /usr/sbin/nginx -s reload -c /etc/nginx/nginx.conf
root      1263  0.0  0.0  34432  2948 ?        R+   18:31   0:00 ps aux
root     29790  0.0  0.0  18252  3328 ?        Ss   18:14   0:00 /bin/bash
nobody   32583  0.4  0.9 548800 150836 ?       Sl   18:25   0:01 nginx: worker process is shutting down
nobody   32584  0.4  0.9 548800 150784 ?       Sl   18:25   0:01 nginx: worker process is shutting down
nobody   32585  0.4  0.9 548800 150940 ?       Sl   18:25   0:01 nginx: worker process is shutting down
nobody   32618  0.4  0.9 548800 150884 ?       Sl   18:25   0:01 nginx: worker process is shutting down
nobody   32733  0.1  0.9 548800 150720 ?       Sl   18:28   0:00 nginx: worker process is shutting down
nobody   32735  0.2  0.9 548800 150820 ?       Sl   18:28   0:00 nginx: worker process is shutting down

As you can see, each worker uses up a lot of memory so the memory usage on this machine is getting out of control.

The worker-shutdown-timeout setting is the default of 10s but these workers are shutting down for well over 10s. Is anyone else running into this issue? Is there some way for me to troubleshoot this?

Here are redacted example lines from the logs

W0510 12:41:23.245036       7 backend_ssl.go:44] error obtaining PEM from secret quuz/quuz-quuz.com-tls: error retrieving secret quuz/quuz-quuz.com-tls: secret quuz/quuz-quuz.com-tls was not found
2018/05/10 12:41:23 [error] 27313#27313: connect() to [2600:1407:16::b832:eefa]:80 failed (101: Network is unreachable) while requesting certificate status, responder: ocsp.int-x3.letsencrypt.org, peer: [
2600:1407:16::b832:eefa]:80, certificate: "/ingress-controller/ssl/foo-foo-www.foo.com-tls.pem"
W0510 12:41:37.174784       7 controller.go:1064] ssl certificate for host bar.com is about to expire in 10 days
W0510 12:41:37.175872       7 controller.go:1047] ssl certificate "baz/baz-baz.com-tls" does not exist in local store
35.186.144.97 - [35.186.144.97] - - [10/May/2018:12:46:23 +0000] "GET / HTTP/1.1" 200 6272 "-" "GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)" 365 0.097 [qux-qux
q-80] 10.0.33.184:80 20572 0.097 200
50.205.217.121 - [50.205.217.121] - - [10/May/2018:12:46:24 +0000] "GET /socket/websocket?jwtToken=REDACTED&gamePlayApiSession=REDACTED&vsn=2.0.0 HTTP/1.1" 403 0 "
-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36" 997 0.004 [quux-quux-80] 10.0.166.36:80 0 0.004 403
-- Jesse Shieh
kubernetes
kubernetes-ingress
nginx

0 Answers