Grafana alert flapping at specific interval

6/22/2018

I created an alert for my prometheus data that is flapping every 30 seconds, which is the specified interval I set to check. I’m trying to check if the desired number of pods doesn’t equal the number of available pods in my k8s cluster for an entire 15 minute period. The alert appears to be stating that the metric I specified is the value it was when it last alerted, but if I click “test alert”, the json returned specifies that there shouldn’t be an alert right now. I’m not sure why this is flapping, and any insight would be greatly appreciated. Here’s the relevant info:

Alert Query:(sum(kube_deployment_spec_replicas{namespace="default"}) without (deployment, instance, pod)) - (sum(kube_deployment_status_replicas_available{namespace="default"}) without (deployment, instance, pod))

Condtion: WHEN min() OF query(G,15m,now) IS ABOVE 0.5

If no data or all values are null set state to Ok

If execution error or timeout set state to keep last state

Edit: When I run the query directly against prometheus, I get the following response:

curl -k -s 'https://prometheus-k8s/api/v1/query?query=(sum(kube_deployment_spec_replicas%7Bnamespace%3D%22default%22%7D)%20without%20(deployment%2C%20instance%2C%20pod))%20-%20(sum(kube_deployment_status_replicas_available%7Bnamespace%3D%22default%22%7D)%20without%20(deployment%2C%20instance%2C%20pod))'

{"status":"success","data":{"resultType":"vector","result":[{"metric":{"endpoint":"https-main","job":"kube-state-metrics","namespace":"default","service":"kube-state-metrics"},"value":[1529946877.247,"0"]}]}}%

-- Jon Aumann
alert
grafana
grafana-alerts
kubernetes
prometheus

0 Answers