promql query to alert on pods in crashloopbackoff

5/28/2020

I want to create an alert for a pod with a specific label when it enters CrashloopBackOff. I'm using kube-state-metrics to find when the pod enters CrashloopBackOff. I tried the following promql query to create the alert rule based on the kube-state-metrics design of return 1 or 0 so that you can filter by multiplication

kube_pod_container_status_waiting_reason{reason=~"CrashLoopBackOff|Error"} * on (pod) group_left() kube_pod_labels{my_label="my-value"} > 0

however, I get the error many-to-many matching not allowed: matching labels must be unique on one side. Full message:

Error executing query: found duplicate series for the match group {pod="test-pod"} on the right hand-side of the operation: [{__name__="kube_pod_labels", instance="XXX-kube-state-metrics.default.svc.cluster.local:8080", job="kube-state-metrics", my_label="my-value", namespace="XXX-1", pod="test-pod"}, {__name__="kube_pod_labels", instance="XXX-kube-state-metrics.default.svc.cluster.local:8080", job="kube-state-metrics", another_label="XXX", my_label="my-value", namespace="XXX-2", pod="test-pod"}];many-to-many matching not allowed: matching labels must be unique on one side

But this query does work:

kube_pod_container_status_waiting_reason{reason=~"CrashLoopBackOff|Error"} > 0 and on(pod) kube_pod_labels{label_seldon_app="prediction-service"} > 0

How can I get this to work so that it returns all the pods in this state with that label across namespaces using * instead of and?

-- Dan
kube-state-metrics
kubernetes
prometheus
prometheus-alertmanager
promql

0 Answers