I use prometheus operator for a deployment of a monitoring stack on kubernetes. I would like to know if there is a way to be aware if the config deployed by the config reloader failed. This is valable for prometheus and alert manager ressources that use a config reloader container to reload their configs. When the config failed. We have a log in the container but can we have a notification or an alert based on a failed config reloading ?
Prometheus exposes a /metric endpoint you can scrape. In particular, there is a metric indicating if the last reload suceeded:
# HELP prometheus_config_last_reload_successful Whether the last configuration reload attempt was successful.
# TYPE prometheus_config_last_reload_successful gauge
prometheus_config_last_reload_successful 0
You can use it to alert on failed reload.
groups:
- name: PrometheusAlerts
rules:
- alert: FailedReload
expr: prometheus_config_last_reload_successful == 0
for: 5m
labels:
severity: warning
annotations:
description: Reloading Prometheus' configuration has failed for {{$labels.namespace}}/{{ $labels.pod}}.
summary: Prometheus configuration reload has failed