Prometheus/Graphana Alerting on pod stuck in pending state

12/19/2020

I'm new to running Prometheus and Graphana. I want to create an alert that fires when a Kubernetes pod is in a pending state for more than 15 minutes. The PromQL query I'm using is:

kube_pod_status_phase{exported_namespace="mynamespace", phase="Pending"} > 0

What I haven't been able to figure out is how to construct an alert based upon how long the pod has been in that state. I've tried a few permutations of alert conditions in Graphana along the lines of:

WHEN avg() OF query (A, 15m, now) IS ABOVE 1

The all fire an alert based upon the number of pods in the state rather than the duration.

How can an alert be constructed based upon the time in the state?

Please & Thank You

-- lovecraft66
grafana
grafana-alerts
kubernetes
prometheus
promql

1 Answer

8/30/2021
- alert: KubernetesPodNotHealthy
expr: min_over_time(sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[15m:1m]) > 0
for: 0m
labels:
  severity: critical
annotations:
  summary: Kubernetes Pod not healthy (instance {{ $labels.instance }})
  description: "Pod has been in a non-ready state for longer than 15 minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
-- dansl1982
Source: StackOverflow