I have several .net Core applications which shutdown for no obvious reason. It looks like that this happens since the implementation of health-checks but I'm not able to see the killing commands in kubernetes.
cmd
kubectl describe pod mypod
output (restart count is this high because of daily shutown in the evening; stage-environment)
Name: mypod
...
Status: Running
...
Controlled By: ReplicaSet/mypod-deployment-6dbb6bcb65
Containers:
myservice:
State: Running
Started: Fri, 01 Nov 2019 09:59:40 +0100
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 01 Nov 2019 07:19:07 +0100
Finished: Fri, 01 Nov 2019 09:59:37 +0100
Ready: True
Restart Count: 19
Liveness: http-get http://:80/liveness delay=10s timeout=1s period=5s #success=1 #failure=10
Readiness: http-get http://:80/hc delay=10s timeout=1s period=5s #success=1 #failure=10
...
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 18m (x103 over 3h29m) kubelet, aks-agentpool-40946522-0 Readiness probe failed: Get http://10.244.0.146:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 18m (x29 over 122m) kubelet, aks-agentpool-40946522-0 Liveness probe failed: Get http://10.244.0.146:80/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
These are the pods logs
cmd
kubectl logs mypod --previous
output
Hosting environment: Production
Content root path: /app
Now listening on: http://[::]:80
Application started. Press Ctrl+C to shut down.
Application is shutting down...
cmd
kubectl get events
output (what I'm missing here is the killing-event. My assumption is that the pod was not restarted, caused by multiple failed health-checks)
LAST SEEN TYPE REASON OBJECT MESSAGE
39m Normal NodeHasSufficientDisk node/aks-agentpool-40946522-0 Node aks-agentpool-40946522-0 status is now: NodeHasSufficientDisk
39m Normal NodeHasSufficientMemory node/aks-agentpool-40946522-0 Node aks-agentpool-40946522-0 status is now: NodeHasSufficientMemory
39m Normal NodeHasNoDiskPressure node/aks-agentpool-40946522-0 Node aks-agentpool-40946522-0 status is now: NodeHasNoDiskPressure
39m Normal NodeReady node/aks-agentpool-40946522-0 Node aks-agentpool-40946522-0 status is now: NodeReady
39m Normal CREATE ingress/my-ingress Ingress default/ebizsuite-ingress
39m Normal CREATE ingress/my-ingress Ingress default/ebizsuite-ingress
7m2s Warning Unhealthy pod/otherpod2 Readiness probe failed: Get http://10.244.0.158:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
7m1s Warning Unhealthy pod/otherpod2 Liveness probe failed: Get http://10.244.0.158:80/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
40m Warning Unhealthy pod/otherpod2 Liveness probe failed: Get http://10.244.0.158:80/liveness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
44m Warning Unhealthy pod/otherpod1 Liveness probe failed: Get http://10.244.0.151:80/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
5m35s Warning Unhealthy pod/otherpod1 Readiness probe failed: Get http://10.244.0.151:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
40m Warning Unhealthy pod/otherpod1 Readiness probe failed: Get http://10.244.0.151:80/hc: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
8m8s Warning Unhealthy pod/mypod Readiness probe failed: Get http://10.244.0.146:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
8m7s Warning Unhealthy pod/mypod Liveness probe failed: Get http://10.244.0.146:80/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s Warning Unhealthy pod/otherpod1 Readiness probe failed: Get http://10.244.0.151:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
curl from another pod (I've executed this in a very long loop every second and have never received something else than a 200 OK)
kubectl exec -t otherpod1 -- curl --fail http://10.244.0.146:80/hc
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
{"status":"Healthy","totalDuration":"00:00:00.0647250","entries":{"self":{"data":{},"duration":"00:00:00.0000012","status":"Healthy"},"warmup":{"data":{},"duration":"00:00:00.0000007","status":"Healthy"},"TimeDB-check":{"data":{},"duration":"00:00:00.0341533","status":"Healthy"},"time-blob-storage-check":{"data":{},"duration":"00:00:00.0108192","status":"Healthy"},"time-rabbitmqbus-check":{"data":{},"duration":"00:00:00.0646841","status":"Healthy"}}}100 454 0 454 0 0 6579 0 --:--:-- --:--:-- --:--:-- 6579
curl
kubectl exec -t otherpod1 -- curl --fail http://10.244.0.146:80/liveness
Healthy % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7 0 7 0 0 7000 0 --:--:-- --:--:-- --:--:-- 7000
I think you can:
Modify livenessprobe and readinessprobe check only http://80
, cut the path in URL
Remove the livenessprobe and readinessprobe (enabled=false)
Just increase the delay time to 5 or 10 mins, after that you can kubectl exec -it <pod-name> sh/bash
into that pod and debug. You can use command netstat
to check the service you want start on port 80 or not. And last thing, you can do the same with readinessprobe or livenessprobe curl -v http://localhost
, if this command return code different 200, that why your pods always restart.
Hope this help you, guy.
From the logs it appears like the issue is with the liveness and readiness probes. Those are getting failed and hence the application is not getting restarted.
Remove the probes and check if the application comes up. get inside the pod and try to check liveness and readiness probes to investigate why those are failing.