I want to calculate the cpu usage of all pods in a kubernetes cluster. I found two metrics in prometheus may be useful:
container_cpu_usage_seconds_total: Cumulative cpu time consumed per cpu in seconds.
process_cpu_seconds_total: Total user and system CPU time spent in seconds.
Cpu Usage of all pods = increment per second of sum(container_cpu_usage_seconds_total{id="/"})/increment per second of sum(process_cpu_seconds_total)
However, I found every second's increment of container_cpu_usage{id="/"}
larger than the increment of sum(process_cpu_seconds_total)
. So the usage may be larger than 1...
Well you can use below query as well:
avg (rate (container_cpu_usage_seconds_total{id="/"}[1m]))
This I'm using to get CPU usage at cluster level:
sum (rate (container_cpu_usage_seconds_total{id="/"}[1m])) / sum (machine_cpu_cores) * 100
I also track the CPU usage for each pod.
sum (rate (container_cpu_usage_seconds_total{image!=""}[1m])) by (pod_name)
I have a complete kubernetes-prometheus solution on GitHub, maybe can help you with more metrics: https://github.com/camilb/prometheus-kubernetes
I created my own prometheus exporter (https://github.com/google-cloud-tools/kube-eagle), primarily to get a better overview of my resource utilization on a per node basis. But it also offers a more intuitive way monitoring your CPU and RAM resources. The query to get the cluster wide CPU usage would look like this:
sum(eagle_pod_container_resource_usage_cpu_cores)
But you can also easily get the CPU usage by namespace, node or nodepool.