Sudden pod restart of kubernetes deployment, reason?

4/9/2021

I'v got microservices deployed on GKE, with Helm v3; all apps/helms stood nicely for months, but yesterday for some reason pods were re-created

kubectl get pods -l app=myapp  

NAME                     READY   STATUS    RESTARTS   AGE
myapp-75cb966746-grjkj   1/1     Running   1          14h
myapp-75cb966746-gz7g7   1/1     Running   0          14h
myapp-75cb966746-nmzzx   1/1     Running   1          14h

the helm3 history myapp shows it was updated 2days ago (40+hrs), not yesterday (so I exclude possibility someone simply run helm3 upgrade ..; (seems like someone ran a command kubectl rollout restart deployment/myapp), any thoughts how can I check why the pods were restarted? I'm not sure how to verify it; PS: the logs from kubectl logs deployment/myapp go back only to 3 hours ago


just for reference, I'm not asking for this command kubectl logs -p myapp-75cb966746-grjkj, with that there is no problem, I want to know what happened to the 3 pods that were there 14 hrs ago, and were simply deleted/replaced - and how to check that.


also no events on the cluster

MacBook-Pro% kubectl get events
No resources found in myns namespace.

as for describing the deployment all there is, is that first the deployment was created few months ago

CreationTimestamp:      Thu, 22 Oct 2020 09:19:39 +0200

and that last update was >40hrs ago

lastUpdate: 2021-04-07 07:10:09.715630534 +0200 CEST m=+1.867748121

here is full describe if someone wants

MacBook-Pro% kubectl describe deployment myapp
Name:                   myapp
Namespace:              myns
CreationTimestamp:      Thu, 22 Oct 2020 09:19:39 +0200
Labels:                 app=myapp
Annotations:            deployment.kubernetes.io/revision: 42
                        lastUpdate: 2021-04-07 07:10:09.715630534 +0200 CEST m=+1.867748121
                        meta.helm.sh/release-name: myapp
                        meta.helm.sh/release-namespace: myns
Selector:               app=myapp,env=myns
Replicas:               3 desired | 3 updated | 3 total | 3 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        5
RollingUpdateStrategy:  25% max unavailable, 1 max surge
Pod Template:
  Labels:       app=myapp
  Annotations:  kubectl.kubernetes.io/restartedAt: 2020-10-23T11:21:11+02:00
  Containers:
   myapp:
    Image:      xxx
    Port:       8080/TCP
    Host Port:  0/TCP
    Limits:
      cpu:     1
      memory:  1G
    Requests:
      cpu:      1
      memory:   1G
    Liveness:   http-get http://:myappport/status delay=45s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get http://:myappport/status delay=45s timeout=5s period=10s #success=1 #failure=3
    Environment Variables from:
      myapp-myns  Secret  Optional: false
    Environment:
      myenv: myval
    Mounts:
      /some/path from myvol (ro)
  Volumes:
   myvol:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      myvol
    Optional:  false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   myapp-75cb966746 (3/3 replicas created)
Events:          <none>
-- potatopotato
google-cloud-platform
google-kubernetes-engine
kubernetes
logging

3 Answers

4/9/2021

You can use

kubectl describe pod your_pod_name

where in Containers.your_container_name.lastState you get time and reason why your last pod was terminated (for example, due to error or due to being OOMKilled)

doc reference:

kubectl explain pod.status.containerStatuses.lastState

KIND:     Pod
VERSION:  v1

RESOURCE: lastState <Object>

DESCRIPTION:
     Details about the container's last termination condition.

     ContainerState holds a possible state of container. Only one of its members
     may be specified. If none of them is specified, the default one is
     ContainerStateWaiting.

FIELDS:
   running      <Object>
     Details about a running container

   terminated   <Object>
     Details about a terminated container

   waiting      <Object>
     Details about a waiting container

Example on one of my containers, which terminated due to error in application:

Containers:
  my_container:
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Tue, 06 Apr 2021 16:28:57 +0300
      Finished:     Tue, 06 Apr 2021 16:32:07 +0300

To get previous logs of your container (the restarted one), you may use --previous key on pod, like this:

kubectl logs your_pod_name --previous
-- Andrew
Source: StackOverflow

4/9/2021

I suggest you run kubectl describe deployment <deployment-name> and kubectl describe pod <pod-name>.

In addition, kubectl get events will show cluster-level events and may help you understand what happened.

-- shaki
Source: StackOverflow

4/9/2021

First thing first, I would check nodes on which the Pods were running.

  • If a Pod is restarted (which means that the RESTART COUNT is incremented) it usually means that the Pod had an error and that error caused the Pod to crash.
  • In your case tho, the Pod were completely recreated, this means (like you said) that someone could have use a rollout restart, or the deployment was scaled down and then up (both manual operations).

The most common case for Pods to be created automatically, is that the node / nodes were the Pods were executing on had a problem. If a node becomes NotReady, even for a small amount of time, Kubernetes Scheduler will try to schedule new Pods on other nodes in order to match the desired state (number of replicas and so on)

Old Pods on a NotReady node will go into Terminating state and will be forced to terminate as soon as the NotReady node will become Ready again (if they are still up and running)

This is described in details in the documentation ( https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-lifetime )

If a Node dies, the Pods scheduled to that node are scheduled for deletion after a timeout period. Pods do not, by themselves, self-heal. If a Pod is scheduled to a node that then fails, the Pod is deleted; likewise, a Pod won't survive an eviction due to a lack of resources or Node maintenance. Kubernetes uses a higher-level abstraction, called a controller, that handles the work of managing the relatively disposable Pod instances.

-- AndD
Source: StackOverflow