Where to get which rules that are failed in Prometheus?

5/4/2021

I got this alert:

Alert:  PrometheusRuleFailures  - critical Description:  Prometheus monitoring/prometheus-prometheus-kube-prometheus-prometheus-0 has failed to evaluate 30 rules in the last 5m. Details:
  • alertname: PrometheusRuleFailures
  • container: prometheus
  • endpoint: web
  • instance: 10.244.0.159:9090
  • job: prometheus-kube-prometheus-prometheus
namespace: monitoring
pod: prometheus-prometheus-kube-prometheus-prometheus-0
prometheus: monitoring/prometheus-kube-prometheus-prometheus
rule_group: /etc/prometheus/rules/prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0/monitoring-prometheus-kube-prometheus-kubelet.rules.yaml;kubelet.rules
service: prometheus-kube-prometheus-prometheus
severity: critical

But when I try to get logs from the pod, it shows no related error (only warn and info)

level=warn ts=2021-05-04T13:36:57.986Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\nexpr: histogram_quantile(0.5, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\n * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\",metrics_path=\"/metrics\"})\nlabels:\n quantile: \"0.5\"\n" err="found duplicate series for the match group {instance=\"209.151.158.125:10250\"} on the right hand-side of the operation: [{__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-operator-kubelet\"}, {__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-kube-prometheus-kubelet\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2021-05-04T13:37:02.027Z caller=manager.go:601 component="rule manager" group=kubernetes-system-kubelet msg="Evaluating rule failed" rule="alert: KubeletPodStartUpLatencyHigh\nexpr: histogram_quantile(0.99, sum by(instance, le) (rate(kubelet_pod_worker_duration_seconds_bucket{job=\"kubelet\",metrics_path=\"/metrics\"}[5m])))\n * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\",metrics_path=\"/metrics\"}\n > 60\nfor: 15m\nlabels:\n severity: warning\nannotations:\n description: Kubelet Pod startup 99th percentile latency is {{ $value }} seconds\n on node {{ $labels.node }}.\n runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletpodstartuplatencyhigh\n summary: Kubelet Pod startup latency is too high.\n" err="found duplicate series for the match group {instance=\"209.151.158.125:10250\"} on the right hand-side of the operation: [{__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-operator-kubelet\"}, {__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-kube-prometheus-kubelet\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2021-05-04T13:37:27.985Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\nexpr: histogram_quantile(0.99, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\n * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\",metrics_path=\"/metrics\"})\nlabels:\n quantile: \"0.99\"\n" err="found duplicate series for the match group {instance=\"209.151.158.125:10250\"} on the right hand-side of the operation: [{__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-operator-kubelet\"}, {__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-kube-prometheus-kubelet\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2021-05-04T13:37:27.986Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\nexpr: histogram_quantile(0.9, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\n * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\",metrics_path=\"/metrics\"})\nlabels:\n quantile: \"0.9\"\n" err="found duplicate series for the match group {instance=\"209.151.158.125:10250\"} on the right hand-side of the operation: [{__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-operator-kubelet\"}, {__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-kube-prometheus-kubelet\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2021-05-04T13:37:27.986Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\nexpr: histogram_quantile(0.5, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\n * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\",metrics_path=\"/metrics\"})\nlabels:\n quantile: \"0.5\"\n" err="found duplicate series for the match group {instance=\"209.151.158.125:10250\"} on the right hand-side of the operation: [{__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-operator-kubelet\"}, {__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-kube-prometheus-kubelet\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2021-05-04T13:37:32.026Z caller=manager.go:601 component="rule manager" group=kubernetes-system-kubelet msg="Evaluating rule failed" rule="alert: KubeletPodStartUpLatencyHigh\nexpr: histogram_quantile(0.99, sum by(instance, le) (rate(kubelet_pod_worker_duration_seconds_bucket{job=\"kubelet\",metrics_path=\"/metrics\"}[5m])))\n * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\",metrics_path=\"/metrics\"}\n > 60\nfor: 15m\nlabels:\n severity: warning\nannotations:\n description: Kubelet Pod startup 99th percentile latency is {{ $value }} seconds\n on node {{ $labels.node }}.\n runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletpodstartuplatencyhigh\n summary: Kubelet Pod startup latency is too high.\n" err="found duplicate series for the match group {instance=\"209.151.158.125:10250\"} on the right hand-side of the operation: [{__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-operator-kubelet\"}, {__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-kube-prometheus-kubelet\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2021-05-04T13:37:57.985Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\nexpr: histogram_quantile(0.99, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\n * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\",metrics_path=\"/metrics\"})\nlabels:\n quantile: \"0.99\"\n" err="found duplicate series for the match group {instance=\"209.151.158.125:10250\"} on the right hand-side of the operation: [{__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-operator-kubelet\"}, {__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-kube-prometheus-kubelet\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2021-05-04T13:37:57.986Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\nexpr: histogram_quantile(0.9, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\n * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\",metrics_path=\"/metrics\"})\nlabels:\n quantile: \"0.9\"\n" err="found duplicate series for the match group {instance=\"209.151.158.125:10250\"} on the right hand-side of the operation: [{__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-operator-kubelet\"}, {__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-kube-prometheus-kubelet\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2021-05-04T13:37:57.987Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\nexpr: histogram_quantile(0.5, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\n * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\",metrics_path=\"/metrics\"})\nlabels:\n quantile: \"0.5\"\n" err="found duplicate series for the match group {instance=\"209.151.158.125:10250\"} on the right hand-side of the operation: [{__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-operator-kubelet\"}, {__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-kube-prometheus-kubelet\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2021-05-04T13:38:02.028Z caller=manager.go:601 component="rule manager" group=kubernetes-system-kubelet msg="Evaluating rule failed" rule="alert: KubeletPodStartUpLatencyHigh\nexpr: histogram_quantile(0.99, sum by(instance, le) (rate(kubelet_pod_worker_duration_seconds_bucket{job=\"kubelet\",metrics_path=\"/metrics\"}[5m])))\n * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\",metrics_path=\"/metrics\"}\n > 60\nfor: 15m\nlabels:\n severity: warning\nannotations:\n description: Kubelet Pod startup 99th percentile latency is {{ $value }} seconds\n on node {{ $labels.node }}.\n runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletpodstartuplatencyhigh\n summary: Kubelet Pod startup latency is too high.\n" err="found duplicate series for the match group {instance=\"209.151.158.125:10250\"} on the right hand-side of the operation: [{__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-operator-kubelet\"}, {__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-kube-prometheus-kubelet\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2021-05-04T13:38:27.985Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\nexpr: histogram_quantile(0.99, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\n * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\",metrics_path=\"/metrics\"})\nlabels:\n quantile: \"0.99\"\n" err="found duplicate series for the match group {instance=\"209.151.158.125:10250\"} on the right hand-side of the operation: [{__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-operator-kubelet\"}, {__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-kube-prometheus-kubelet\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2021-05-04T13:38:27.986Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\nexpr: histogram_quantile(0.9, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\n * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\",metrics_path=\"/metrics\"})\nlabels:\n quantile: \"0.9\"\n" err="found duplicate series for the match group {instance=\"209.151.158.125:10250\"} on the right hand-side of the operation: [{__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-operator-kubelet\"}, {__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-kube-prometheus-kubelet\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2021-05-04T13:38:27.987Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\nexpr: histogram_quantile(0.5, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\n * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\",metrics_path=\"/metrics\"})\nlabels:\n quantile: \"0.5\"\n" err="found duplicate series for the match group {instance=\"209.151.158.125:10250\"} on the right hand-side of the operation: [{__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-operator-kubelet\"}, {__name__=\"kubelet_node_name\", endpoint=\"https-metrics\", instance=\"209.151.158.125:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"cyza-node6\", service=\"prometheus-kube-prometheus-kubelet\"}];many-to-many matching not allowed: matching labels must be unique on one side"

where could I get which (those 30) rules that are failed? (I'm using prometheus-kube-stack)

-- Kokizzu
kubernetes
prometheus
prometheus-alertmanager

1 Answer

3/16/2022

Your best starting point is the rules page of the Prometheus UI (:9090/rules).
It will show the error on specific rule(s).

-- Rohlik
Source: StackOverflow