By definition, kube_pod_container_status_waiting_reason
is supposed to capture reasons for a pod in Waiting status.
I have several pods in my kubernetes cluster which are in CrashLoopBackOff but I dont see that reason captured by kube_pod_container_status_waiting_reason
. It only captures two reasons - ErrImagePull and ContainerCreating.
~$ k get pods -o wide --show-all --all-namespaces | grep Crash
cattle-system cattle-cluster-agent-6f744c67cc-jlkjh 0/1 CrashLoopBackOff 2885 10d 10.233.121.247 k8s-4
cattle-system cattle-node-agent-6klkh 0/1 CrashLoopBackOff 2886 171d 10.171.201.127 k8s-2
cattle-system cattle-node-agent-j6r94 0/1 CrashLoopBackOff 2887 171d 10.171.201.110 k8s-3
cattle-system cattle-node-agent-nkfcq 0/1 CrashLoopBackOff 17775 171d 10.171.201.131 k8s-1
cattle-system cattle-node-agent-np76b 0/1 CrashLoopBackOff 2887 171d 10.171.201.89 k8s-4
cattle-system cattle-node-agent-pwn5v 0/1 CrashLoopBackOff 2859 171d 10.171.202.72 k8s-5
Running sum by (reason) (kube_pod_container_status_waiting_reason)
in prometheus yields results:
Element Value
{reason="ContainerCreating"} 0
{reason="ErrImagePull"} 0
I am running quay.io/coreos/kube-state-metrics:v1.2.0
image of kube-state-metrics.
What am I missing? Why is the CrashLoopBackOff reason not showing up in the query? I would like to set up an alert which finds pods in the waiting status with the reason. So thinking of merging kube_pod_container_status_waiting
to find the pods in the waiting status and kube_pod_container_status_waiting_reason
to find the exact reason.
Please assist. Thank you!
You are running into this. Basically, it looks like you are using kube-state-metrics 1.2.0
or earlier. You see that ImagePullBackOff
and CrashLoopBackOff
was added in 1.3.0
.
So update your image to:
k8s.gcr.io/kube-state-metrics:v1.3.0
quay.io/coreos/kube-state-metrics:v1.3.0
or
k8s.gcr.io/kube-state-metrics:v1.4.0
quay.io/coreos/kube-state-metrics:v1.4.0