I'm having troubles with nginx-lego (I know it's deprecated) and node autoscaler. I had to scale up manually through an HPA and patching temporarily minReplicas to a high number. All scaled well, new nodes were added because of pod increase.
After the traffic spike, I set the number back to normal (which is really low) and I can see a lot of bad gateway 502 errors. After I examined the nginx-lego pod's log, I was able to see that plenty of requests were going to pods that aren't there anymore (connection refused or No route to host).
2018/11/21 17:48:49 [error] 5546#5546: *6908265 connect() failed (113: No route to host) while connecting to upstream, client: 100.112.130.0, server: xxxx.com, request: "GET /public/images/social-instagram.png HTTP/1.1", upstream: "http://X.X.X.X:3000/public/images/social-instagram.png", host: "xxxx.com", referrer: "https://outlook.live.com/"
2018/11/21 17:48:49 [error] 5409#5409: *6908419 connect() failed (113: No route to host) while connecting to upstream, client: 10.5.143.204, server: xxxx.com, request: "GET /public/images/social-instagram.png HTTP/1.1", upstream: "http://X.X.X.X:3000/public/images/social-instagram.png", host: "xxxx.com"
2018/11/21 17:48:49 [error] 5546#5546: *6908420 connect() failed (111: Connection refused) while connecting to upstream, client: 10.5.143.204, server: xxxx.com, request: "GET /public/images/social-facebook.png HTTP/1.1", upstream: "http://X.X.X.X:3000/public/images/social-facebook.png", host: "xxxx.com"
Any idea on what could be wrong?
I guess that patching minReplicas isn't probably the best way how to do it, but I knew that there will be a spike and I didn't have a better idea on how to pre-scale the whole cluster.
Looks like a problem with your nginx ingress (lego) controller not updating the nginx.conf
, when scaling down. I would examine the nginx.conf
and see if it's pointing to backends that don't exist anymore.
$ kubectl cp <nginx-lego-pod>:nginx.conf .
If something looks odd you might have to delete the pod so that it gets created by the ReplicaSet managing your nginx ingress controller pods.
$ kubectl delete <nginx-controller-pod>
Then examine the nginx.conf
again.
Another issue could be your endpoints for your backend services not being updated by Kubernetes, but this would be unrelated directly to upscaling/downscaling your lego HPA. You can check with:
$ kubectl get ep
And see if there are any that don't exist anymore.