How to identify unhealthy pods in a statefulset

7/27/2021

I have a StatefulSet with 6 replicas.

All of a sudden StatefulSet thinks there are 5 ready replicas out if 6. When I look at the pod status all 6 pods are ready with all the readiness checks passed 1/1.

Now I am trying to find logs or status that shows which pod is unhealthy as per the StatefulSet, so I could debug further.

Where can I find information or logs for the StatefulSet that could tell me which pod is unhealthy? I have already checked the output of describe pods and describe statefulset but none of them show which pod is unhealthy.

-- user3435964
kubernetes
kubernetes-statefulset

1 Answer

7/27/2021

So lets say you created next statefulset:

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    user: anurag
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      user: anurag # has to match .spec.template.metadata.labels
  serviceName: "nginx"
  replicas: 6 # by default is 1
  template:
    metadata:
      labels:
        user: anurag # has to match .spec.selector.matchLabels
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: k8s.gcr.io/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "standard"
      resources:
        requests:
          storage: 1Gi

Result is:

kubectl get StatefulSet web -o wide
NAME   READY   AGE     CONTAINERS   IMAGES
web    6/6     8m31s   nginx        k8s.gcr.io/nginx-slim:0.8

What we can also check StatefulSet's status in:

kubectl get statefulset web -o yaml
status:
  collisionCount: 0
  currentReplicas: 6
  currentRevision: web-599978b754
  observedGeneration: 1
  readyReplicas: 6
  replicas: 6
  updateRevision: web-599978b754
  updatedReplicas: 6

As per Debugging a StatefulSet, you can list all the pods which belong to a current StatefulSet using labels.

$  kubectl get pods -l user=anurag
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          13m
web-1   1/1     Running   0          12m
web-2   1/1     Running   0          12m
web-3   1/1     Running   0          12m
web-4   1/1     Running   0          12m
web-5   1/1     Running   0          11m

Here, at this point, if any of your pods aren't available- you will definitely see that. And next debugging is Debug Pods and ReplicationControllers including checks if you have enough sufficient resources to start all these pods and etc etc.

Describe problematic pod (kubectl describe pod web-0) should give you an answer why that happened in the very end in Events section.


For example, if you will use origin yaml as it is for this example from statefulset components - you will have an error and any of your pods will up and running. (The reason is storageClassName: "my-storage-class" )

The exact error and understanding what is happening comes from describing problematic pod... that's how it works.

kubectl describe pod web-0
vents:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  31s (x2 over 31s)  default-scheduler  0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
-- Vit
Source: StackOverflow