I have some trouble in understanding why I get multiple results for the same pod in Prometheus/Grafana.
I'm trying to get cpu usage through rate(container_cpu_usage_seconds_total{namespace=~".+-test", pod=~"my-server-.+", image!~"|.*pause.*", container!="POD"}[5m])
.
The container
label excludes the results with the POD
string. I found that those refers to the pause container which holds namespace and other things before the container starts.
However I get pause containers in the image
label. So I excluded them from that label.
Then I found some containers without the image
label and I excluded them inserting an or (|
) in the image
label.
In some cases the cpu usage of the container without the image
label is lower than the one of the "correct" container (the one with the correct image
and container
labels) and in other cases it is very similar, but never the same.
Example:
I would like to understand what are those containers and what they refer to.
PS. the metrics are from cadvisor
.
Try this query:
rate(container_cpu_usage_seconds_total{container!="POD", container=~".+"}[5m])
In short, CPU usage is available at several resolutions (container, pod, QoS class) and this query above effectively eliminates everything except containers that you defined explicitly. !="POD"
removes pause containers and container=~".+"
means "not empty". No resolution besides "per container" has this label.