I have a Prometheus pod running along with my Kube-State-Metrics (KSM) pod. The KSM collects all the metrics from all the pods across all the namespaces in the cluster. Prometheus simply scrapes the metrics from KSM - this way Prometheus doesn't need to scrape the individual pods.
When pods are deployed, their deployment has certain pod-related labels as shown below. They have two important labels: APP and TEAM:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
APP: AppABC
TEAM: TeamABC
...
Within Prometheus, my scrape configuration looks like this:
scrape_configs:
- job_name: 'pod monitoring'
honor_labels: true
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
...
Problem is, when Prometheus scrapes the information from kube-state-metrics, it overwrites the APP
with kube-state-metrics
. e.g. this metric below is actually for an app called "AppABC", yet Prometheus overwrote the app
label to kube-state-metrics
.
kube_pod_container_status_restarts_total{
app="kube-state-metrics",
container="appabccontainer",
job="pod monitoring",
namespace="test-namespace",
pod="appabc-766cbcb68d-29smr"
}
Is there anyway for me to scrape the metrics from kube-state-metrics BUT keep the APP and TEAM labels together without overwriting them?
EDIT - I figured it out
My Issue: My deployment and pods have certain labels defined (APP, TEAM). Kube-state-metrics gets these from K8 API. When Prometheus scrapes from kube-state-metrics, it doesn't have those labels.
My Objective: Expose those labels into Prometheus.
My Solution: Using PromQL you can do group by. So in my prometheus-rules.yaml, I changed this:
expr: kube_pod_status_phase{phase="Failed"} > 0
to this:
expr: kube_pod_status_phase{phase="Failed"} * on (pod,namespace) group_right kube_pod_labels > 0
So my new alert rule looks like this:
- name: Pod_Failed
rules:
- alert: pod_failed
expr: kube_pod_status_phase{phase="Failed"} * on (pod,namespace) group_right kube_pod_labels > 0
labels:
appname: '{{ $labels.label_APP }}' # This is what I wanted to capture
teamname: '{{ $labels.label_TEAM }}' # This is what I wanted to capture
annotations:
summary: 'Pod: {{ $labels.pod }} is down'
description: 'Pod: {{ $labels.pod }} is down in {{ $labels.namespace }} namespace.'
Solution: Using PromQL you can do group by. So in my prometheus-rules.yaml, I changed this:
expr: kube_pod_status_phase{phase="Failed"} > 0
to this:
expr: kube_pod_status_phase{phase="Failed"} * on (pod,namespace) group_right kube_pod_labels > 0
So my new alert rule looks like this:
- name: Pod_Failed
rules:
- alert: pod_failed
expr: kube_pod_status_phase{phase="Failed"} * on (pod,namespace) group_right kube_pod_labels > 0
labels:
appname: '{{ $labels.label_APP }}' # This is what I wanted to capture
teamname: '{{ $labels.label_TEAM }}' # This is what I wanted to capture
annotations:
summary: 'Pod: {{ $labels.pod }} is down'
description: 'Pod: {{ $labels.pod }} is down in {{ $labels.namespace }} namespace.'