nginx-ingress failing intermittely
NGINX Ingress controller version: 0.22.0 Image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.22.0 Image ID: docker-pullable://quay.io/kubernetes-ingress-controller/nginx-ingress-controller@sha256:47ef793dc8dfcbf73c9dee4abfb87afa3aa8554c35461635f6539c6cc5073b2c
quay.io/kubernetes-ingress-controller/nginx-ingress-controller Kubernetes version (use kubectl version): v1.15.3
Environment:
Cloud provider or hardware configuration: Vm's in Vcenter OS (e.g. from /etc/os-release): VERSION="16.04.6 LTS (Xenial Xerus)" Kernel (e.g. uname -a):Linux appsec-ana01 4.4.0-143-generic #169-Ubuntu SMP Thu Feb 7 07:56:38 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Install tools: Others: What happened:
my nginx-ingress-controller is failing and restarting inconsistently..
What you expected to happen: No Restart to happen
How to reproduce it (as minimally and precisely as possible): Same Version issue should occur. curl -I http://10.244.10.48:10254/healthz HTTP/1.1 200 OK Date: Wed, 11 Sep 2019 11:15:56 GMT Content-Length: 2 Content-Type: text/plain; charset=utf-8
Anything else we need to know: kubectl get events results in following
2m Warning Unhealthy pod/nginx-ingress-controller-7cfb747d6c-4n4nz Liveness probe failed: Get http://10.244.10.48:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 30s Warning Unhealthy pod/nginx-ingress-controller-7cfb747d6c-4n4nz Liveness probe failed: Get http://10.244.10.48:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers) 6m5s Warning Unhealthy pod/nginx-ingress-controller-7cfb747d6c-4n4nz Readiness probe failed: Get http://10.244.10.48:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers) 35m Warning Unhealthy pod/nginx-ingress-controller-7cfb747d6c-4n4nz Readiness probe failed: Get http://10.244.10.48:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
curl -I http://10.244.10.48:10254/healthz HTTP/1.1 200 OK Date: Wed, 11 Sep 2019 11:15:56 GMT Content-Length: 2 Content-Type: text/plain; charset=utf-8
The errors in you logs clearly say about health check issues with a particular nginx-ingress-controller*
Pod. Due to the fact that Readiness and Liveness probes are fully managed by kubelet node agent, I would check carefully kubelet
service, in order to fetch any related errors or suspicious events:
$ sudo systemctl status kubelet -l
$ sudo journalctl -u kubelet
Therefore, the issue may occurs with a connectivity from kubelet
node to K8s API server, despite your successful efforts testing curl
requests targeting /health
endpoint.
Meanwhile, you can check logs from Nginx Ingress controller's Pod and inspect bootstrap events:
kubectl logs $(kubectl get po -l app=nginx-ingress -o jsonpath='{.items[0].metadata.name}')
I would also check the K8s Node capacity where origin nginx-ingress-controller*
Pod resides on and observe overall resource utilization for allocated objects across the cluster.
I recommend you to review the official Nginx Ingress-Controller troubleshooting document to get more ways for investigation.
I would say it's not connected with Nginx controller but rather with Kubernetes node. Have you seen this?