Monitoring Kubernetes with Grafana: lots of missing data with latest Prometheus version

7/4/2018

I have a working Kubernetes cluster that I want to monitor with Grafana.

I have been trying out many dashboards from https://grafana.com/dashboards but they all seem to have some problems: it looks like there's a mismatch between the Prometheus metric names and what the dashboard expects.

Eg if I look at this recently released, quite popular dashboard: https://grafana.com/dashboards/5309/revisions

I end up with many "holes" when running it:

grafana dashboard with missing values

Looking into the panel configuration, I see that the issues come from small key changes, eg node_memory_Buffers instead of node_memory_Buffers_bytes.

Similarly the dashboard expects node_disk_bytes_written when Prometheus provides node_disk_written_bytes_total.

I have tried out a lot of Kubernetes-specific dashboards and I have the same problem with almost all of them.

Am I doing something wrong?

-- MasterScrat
grafana
kubernetes
prometheus

1 Answer

7/4/2018

The Prometheus node exporter changed a lot of the metric names in the 0.16.0 version to conform to new naming conventions.

From https://github.com/prometheus/node_exporter/releases/tag/v0.16.0:

Breaking changes

This release contains major breaking changes to metric names. Many metrics have new names, labels, and label values in order to conform to current naming conventions.

  • Linux node_cpu metrics now break out guest values into separate metrics.
  • Many counter metrics have been renamed to include _total.
  • Many metrics have been renamed/modified to include base units, for example node_cpu is now node_cpu_seconds_total.

See also the upgrade guide. One of its suggestion is to use compatibility rules that will create duplicate metrics with the old names.

Otherwise use version 0.15.x until the dashboards are updated, or fix them!

-- Scott Anderson
Source: StackOverflow