I/O monitoring on Kubernetes / CoreOS nodes

9/26/2017

I have a Kubernetes cluster. Provisioned with kops, running on CoreOS workers. From time to time I see a significant load spikes, that correlate with I/O spikes reported in Prometheus from node_disk_io_time_ms metric. The thing is, I seem to be unable to use any metric to pinpoint where this I/O workload actually originates from. Metrics like container_fs_* seem to be useless as I always get zero values for actual containers, and any data only for whole node.

Any hints on how can I approach the issue of locating what is to be blamed for I/O load in kube cluster / coreos node very welcome

-- Radek 'Goblin' Pieczonka
coreos
io
kubernetes
load
prometheus

1 Answer

10/29/2017

If you are using nginx ingress you can configure it with

enable-vts-status: "true"

This will give you a bunch of prometheus metrics for each pod that has on ingress. The metric names start with nginx_upstream_

In case it is the cronjob creating the spikes, install node-exporter daemonset and check the metrics container_fs_

-- cohadar
Source: StackOverflow