Identify Reason for application shutdown in Kubernetes

11/1/2019

I have several .net Core applications which shutdown for no obvious reason. It looks like that this happens since the implementation of health-checks but I'm not able to see the killing commands in kubernetes.

cmd

kubectl describe pod mypod

output (restart count is this high because of daily shutown in the evening; stage-environment)

Name:               mypod
...
Status:             Running
...
Controlled By:      ReplicaSet/mypod-deployment-6dbb6bcb65
Containers:
  myservice:
    State:          Running
      Started:      Fri, 01 Nov 2019 09:59:40 +0100
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 01 Nov 2019 07:19:07 +0100
      Finished:     Fri, 01 Nov 2019 09:59:37 +0100
    Ready:          True
    Restart Count:  19
    Liveness:       http-get http://:80/liveness delay=10s timeout=1s period=5s #success=1 #failure=10
    Readiness:      http-get http://:80/hc delay=10s timeout=1s period=5s #success=1 #failure=10
    ...
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
...
Events:
  Type     Reason     Age                    From                               Message
  ----     ------     ----                   ----                               -------
  Warning  Unhealthy  18m (x103 over 3h29m)  kubelet, aks-agentpool-40946522-0  Readiness probe failed: Get http://10.244.0.146:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  18m (x29 over 122m)    kubelet, aks-agentpool-40946522-0  Liveness probe failed: Get http://10.244.0.146:80/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

These are the pods logs

cmd

kubectl logs mypod --previous

output

Hosting environment: Production
Content root path: /app
Now listening on: http://[::]:80
Application started. Press Ctrl+C to shut down.
Application is shutting down...

corresponding log from azure azure-log

cmd

kubectl get events

output (what I'm missing here is the killing-event. My assumption is that the pod was not restarted, caused by multiple failed health-checks)

LAST SEEN   TYPE      REASON                    OBJECT                                                      MESSAGE
39m         Normal    NodeHasSufficientDisk     node/aks-agentpool-40946522-0                               Node aks-agentpool-40946522-0 status is now: NodeHasSufficientDisk
39m         Normal    NodeHasSufficientMemory   node/aks-agentpool-40946522-0                               Node aks-agentpool-40946522-0 status is now: NodeHasSufficientMemory
39m         Normal    NodeHasNoDiskPressure     node/aks-agentpool-40946522-0                               Node aks-agentpool-40946522-0 status is now: NodeHasNoDiskPressure
39m         Normal    NodeReady                 node/aks-agentpool-40946522-0                               Node aks-agentpool-40946522-0 status is now: NodeReady
39m         Normal    CREATE                    ingress/my-ingress                                          Ingress default/ebizsuite-ingress
39m         Normal    CREATE                    ingress/my-ingress                                          Ingress default/ebizsuite-ingress
7m2s        Warning   Unhealthy                 pod/otherpod2                                               Readiness probe failed: Get http://10.244.0.158:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
7m1s        Warning   Unhealthy                 pod/otherpod2                                               Liveness probe failed: Get http://10.244.0.158:80/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
40m         Warning   Unhealthy                 pod/otherpod2                                               Liveness probe failed: Get http://10.244.0.158:80/liveness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
44m         Warning   Unhealthy                 pod/otherpod1                                               Liveness probe failed: Get http://10.244.0.151:80/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
5m35s       Warning   Unhealthy                 pod/otherpod1                                               Readiness probe failed: Get http://10.244.0.151:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
40m         Warning   Unhealthy                 pod/otherpod1                                               Readiness probe failed: Get http://10.244.0.151:80/hc: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
8m8s        Warning   Unhealthy                 pod/mypod                                                   Readiness probe failed: Get http://10.244.0.146:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
8m7s        Warning   Unhealthy                 pod/mypod                                                   Liveness probe failed: Get http://10.244.0.146:80/liveness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
0s          Warning   Unhealthy                 pod/otherpod1                                               Readiness probe failed: Get http://10.244.0.151:80/hc: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

curl from another pod (I've executed this in a very long loop every second and have never received something else than a 200 OK)

kubectl exec -t otherpod1 -- curl --fail http://10.244.0.146:80/hc

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
{"status":"Healthy","totalDuration":"00:00:00.0647250","entries":{"self":{"data":{},"duration":"00:00:00.0000012","status":"Healthy"},"warmup":{"data":{},"duration":"00:00:00.0000007","status":"Healthy"},"TimeDB-check":{"data":{},"duration":"00:00:00.0341533","status":"Healthy"},"time-blob-storage-check":{"data":{},"duration":"00:00:00.0108192","status":"Healthy"},"time-rabbitmqbus-check":{"data":{},"duration":"00:00:00.0646841","status":"Healthy"}}}100   454    0   454    0     0   6579      0 --:--:-- --:--:-- --:--:--  6579

curl

kubectl exec -t otherpod1 -- curl --fail http://10.244.0.146:80/liveness

Healthy  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100     7    0     7    0     0   7000      0 --:--:-- --:--:-- --:--:--  7000
-- masterchris_99
asp.net
asp.net-core
c#
docker
kubernetes

2 Answers

11/1/2019

I think you can:

  1. Modify livenessprobe and readinessprobe check only http://80, cut the path in URL

  2. Remove the livenessprobe and readinessprobe (enabled=false)

  3. Just increase the delay time to 5 or 10 mins, after that you can kubectl exec -it <pod-name> sh/bash into that pod and debug. You can use command netstat to check the service you want start on port 80 or not. And last thing, you can do the same with readinessprobe or livenessprobe curl -v http://localhost, if this command return code different 200, that why your pods always restart.

Hope this help you, guy.

-- Thọ Quách
Source: StackOverflow

11/1/2019

From the logs it appears like the issue is with the liveness and readiness probes. Those are getting failed and hence the application is not getting restarted.

Remove the probes and check if the application comes up. get inside the pod and try to check liveness and readiness probes to investigate why those are failing.

-- P Ekambaram
Source: StackOverflow