I have an EKS cluster running kubernetes 1.14. I deployed the Nginx controller on the cluster following these steps from the following link.
Here are the steps that I followed -
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/mandatory.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/aws/service-l4.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/aws/patch-configmap-l4.yaml
But I keep getting these errors intermittently in the ingress controller.
2019/10/15 15:21:25 [error] 40#40: *243746 upstream timed out (110: Connection timed out) while connecting to upstream, client: 63.xxx.xx.xx, server: x.y.com, request: "HEAD / HTTP/1.1", upstream: "http://172.20.166.58:80/", host: "x.y.com"
And sometimes these -
{"log":"2019/10/15 02:58:40 [error] 119#119: *2985 connect() failed (113: No route to host) while connecting to upstream, client: xx.1xx.81.1xx, server: a.b.com , request: \"OPTIONS /api/v1/xxxx/xxxx/xxx HTTP/2.0\", upstream: \"http://172.20.195.137:9050/api/xxx/xxx/xxxx/xxx\ ", host: \"a.b.com \", referrer: \"https://x.y.com/app/connections\"\n","stream":"stderr","time":"2019-10-15T02:58:40.565930449Z "}
I am using the native Amazon VPC CNI plugin for Kubernetes for networking -
amazon-k8s-cni:v1.5.4
I noticed that a couple of replicas out of the 5 replicas of the nginx ingress controller pod were not able to talk to the backend application. To check the connectivity between the nginx ingress controller pods and the backend applications I sshed into the nginx ingress controller pod and tried to curl the backend service and it timed out, but when I ssh into another backend service and then curl the same backend service it returns a 200 status code. The way I temporarily fixed it was by deleting the replicas that were not able to talk to the backend and recreated it. This temporarily fixed the issue but after a few hours the same errors start showing up again.
amazon-k8s-cni:v1.5.4
Has known issues with DNS and pod to pod communication. It's recommended to revert back to
amazon-k8s-cni:v1.5.3
I had the same issues you're seeing and going back to v1.5.3 seemed to resolve it for me. I think they recently reverted the plugin back to v1.5.3 for when an eks cluster is launched anyways.