I have a Kubernetes cluster. Provisioned with kops
, running on CoreOS
workers. From time to time I see a significant load spikes, that correlate with I/O spikes reported in Prometheus from node_disk_io_time_ms
metric. The thing is, I seem to be unable to use any metric to pinpoint where this I/O workload actually originates from. Metrics like container_fs_*
seem to be useless as I always get zero values for actual containers, and any data only for whole node.
Any hints on how can I approach the issue of locating what is to be blamed for I/O load in kube cluster / coreos node very welcome
If you are using nginx ingress you can configure it with
enable-vts-status: "true"
This will give you a bunch of prometheus metrics for each pod that has on ingress. The metric names start with nginx_upstream_
In case it is the cronjob creating the spikes, install node-exporter daemonset and check the metrics container_fs_