I want to create an alert for a pod with a specific label when it enters CrashloopBackOff. I'm using kube-state-metrics to find when the pod enters CrashloopBackOff. I tried the following promql query to create the alert rule based on the kube-state-metrics design of return 1
or 0
so that you can filter by multiplication
kube_pod_container_status_waiting_reason{reason=~"CrashLoopBackOff|Error"} * on (pod) group_left() kube_pod_labels{my_label="my-value"} > 0
however, I get the error many-to-many matching not allowed: matching labels must be unique on one side
. Full message:
Error executing query: found duplicate series for the match group {pod="test-pod"} on the right hand-side of the operation: [{__name__="kube_pod_labels", instance="XXX-kube-state-metrics.default.svc.cluster.local:8080", job="kube-state-metrics", my_label="my-value", namespace="XXX-1", pod="test-pod"}, {__name__="kube_pod_labels", instance="XXX-kube-state-metrics.default.svc.cluster.local:8080", job="kube-state-metrics", another_label="XXX", my_label="my-value", namespace="XXX-2", pod="test-pod"}];many-to-many matching not allowed: matching labels must be unique on one side
But this query does work:
kube_pod_container_status_waiting_reason{reason=~"CrashLoopBackOff|Error"} > 0 and on(pod) kube_pod_labels{label_seldon_app="prediction-service"} > 0
How can I get this to work so that it returns all the pods in this state with that label across namespaces using *
instead of and
?