I have a working Kubernetes cluster that I want to monitor with Grafana.
I have been trying out many dashboards from https://grafana.com/dashboards but they all seem to have some problems: it looks like there's a mismatch between the Prometheus metric names and what the dashboard expects.
Eg if I look at this recently released, quite popular dashboard: https://grafana.com/dashboards/5309/revisions
I end up with many "holes" when running it:
Looking into the panel configuration, I see that the issues come from small key changes, eg node_memory_Buffers
instead of node_memory_Buffers_bytes
.
Similarly the dashboard expects node_disk_bytes_written
when Prometheus provides node_disk_written_bytes_total
.
I have tried out a lot of Kubernetes-specific dashboards and I have the same problem with almost all of them.
Am I doing something wrong?
The Prometheus node exporter changed a lot of the metric names in the 0.16.0 version to conform to new naming conventions.
From https://github.com/prometheus/node_exporter/releases/tag/v0.16.0:
Breaking changes
This release contains major breaking changes to metric names. Many metrics have new names, labels, and label values in order to conform to current naming conventions.
- Linux node_cpu metrics now break out
guest
values into separate metrics.- Many counter metrics have been renamed to
include _total
.- Many metrics have been renamed/modified to include base units, for example
node_cpu
is nownode_cpu_seconds_total
.
See also the upgrade guide. One of its suggestion is to use compatibility rules that will create duplicate metrics with the old names.
Otherwise use version 0.15.x until the dashboards are updated, or fix them!