I have a StatefulSet with 6 replicas.
All of a sudden StatefulSet thinks there are 5 ready replicas out if 6. When I look at the pod status all 6 pods are ready with all the readiness checks passed 1/1
.
Now I am trying to find logs or status that shows which pod is unhealthy as per the StatefulSet, so I could debug further.
Where can I find information or logs for the StatefulSet that could tell me which pod is unhealthy? I have already checked the output of describe pods
and describe statefulset
but none of them show which pod is unhealthy.
So lets say you created next statefulset:
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
user: anurag
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
user: anurag # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 6 # by default is 1
template:
metadata:
labels:
user: anurag # has to match .spec.selector.matchLabels
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "standard"
resources:
requests:
storage: 1Gi
Result is:
kubectl get StatefulSet web -o wide
NAME READY AGE CONTAINERS IMAGES
web 6/6 8m31s nginx k8s.gcr.io/nginx-slim:0.8
What we can also check StatefulSet
's status in:
kubectl get statefulset web -o yaml
status:
collisionCount: 0
currentReplicas: 6
currentRevision: web-599978b754
observedGeneration: 1
readyReplicas: 6
replicas: 6
updateRevision: web-599978b754
updatedReplicas: 6
As per Debugging a StatefulSet, you can list all the pods which belong to a current StatefulSet using labels.
$ kubectl get pods -l user=anurag
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 13m
web-1 1/1 Running 0 12m
web-2 1/1 Running 0 12m
web-3 1/1 Running 0 12m
web-4 1/1 Running 0 12m
web-5 1/1 Running 0 11m
Here, at this point, if any of your pods aren't available- you will definitely see that. And next debugging is Debug Pods and ReplicationControllers including checks if you have enough sufficient resources to start all these pods and etc etc.
Describe problematic pod (kubectl describe pod web-0
) should give you an answer why that happened in the very end in Events section.
For example, if you will use origin yaml as it is for this example from statefulset components - you will have an error and any of your pods will up and running. (The reason is storageClassName: "my-storage-class"
)
The exact error and understanding what is happening comes from describing problematic pod... that's how it works.
kubectl describe pod web-0
vents:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 31s (x2 over 31s) default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.