nginx-lego and autoscaler don't play well after scaling down

11/21/2018

I'm having troubles with nginx-lego (I know it's deprecated) and node autoscaler. I had to scale up manually through an HPA and patching temporarily minReplicas to a high number. All scaled well, new nodes were added because of pod increase.

After the traffic spike, I set the number back to normal (which is really low) and I can see a lot of bad gateway 502 errors. After I examined the nginx-lego pod's log, I was able to see that plenty of requests were going to pods that aren't there anymore (connection refused or No route to host).

2018/11/21 17:48:49 [error] 5546#5546: *6908265 connect() failed (113: No route to host) while connecting to upstream, client: 100.112.130.0, server: xxxx.com, request: "GET /public/images/social-instagram.png HTTP/1.1", upstream: "http://X.X.X.X:3000/public/images/social-instagram.png", host: "xxxx.com", referrer: "https://outlook.live.com/"
2018/11/21 17:48:49 [error] 5409#5409: *6908419 connect() failed (113: No route to host) while connecting to upstream, client: 10.5.143.204, server: xxxx.com, request: "GET /public/images/social-instagram.png HTTP/1.1", upstream: "http://X.X.X.X:3000/public/images/social-instagram.png", host: "xxxx.com"
2018/11/21 17:48:49 [error] 5546#5546: *6908420 connect() failed (111: Connection refused) while connecting to upstream, client: 10.5.143.204, server: xxxx.com, request: "GET /public/images/social-facebook.png HTTP/1.1", upstream: "http://X.X.X.X:3000/public/images/social-facebook.png", host: "xxxx.com"

Any idea on what could be wrong?

I guess that patching minReplicas isn't probably the best way how to do it, but I knew that there will be a spike and I didn't have a better idea on how to pre-scale the whole cluster.

-- OndrejK
amazon-web-services
kubernetes
nginx-ingress

1 Answer

11/21/2018

Looks like a problem with your nginx ingress (lego) controller not updating the nginx.conf, when scaling down. I would examine the nginx.conf and see if it's pointing to backends that don't exist anymore.

$ kubectl cp <nginx-lego-pod>:nginx.conf . 

If something looks odd you might have to delete the pod so that it gets created by the ReplicaSet managing your nginx ingress controller pods.

$ kubectl delete <nginx-controller-pod>

Then examine the nginx.conf again.

Another issue could be your endpoints for your backend services not being updated by Kubernetes, but this would be unrelated directly to upscaling/downscaling your lego HPA. You can check with:

$ kubectl get ep 

And see if there are any that don't exist anymore.

-- Rico
Source: StackOverflow