I am running different versions of our application in different namespaces and I have set up a prometheus and grafana stack to monitor them. I am using below promql for getting the cpu usage of different pods (as percentage of 1 core) and the value that it returns is matching the values that I get from the kubectl top pods -n namespace
:
sum (rate (container_cpu_usage_seconds_total{id!="/",namespace=~"$Namespace",pod=~"^$Deployment.*quot;}[1m])) by (pod)*100
The problem is I want to get the total cpu usage of all pods in a namespace cluster-wide and I tried different queries but the values that they return is not matching the total cpu usage that I get from the above promql or kubectl top pods -n namespace
.
The promql queries that I tried:
sum (rate (container_cpu_usage_seconds_total{namespace=~"$Namespace",pod=~"^$Deployment.*quot;}[1m])) by (namespace)
sum (rate (container_cpu_usage_seconds_total{namespace=~"$Namespace",pod=~"^$Deployment.*quot;}[1m]))
I am using the Singlestat
for this and also at visualization
from Value
section I tried different show
methods such as Average, total, current but non returned the correct value.
My question is how I can get the total cpu usage of all the pods in a namespace cluster-wide?
I have made some research and found few answers that could suit your needs:
In order to simply monitor CPU usage at cluster level use: sum (rate (container_cpu_usage_seconds_total{id="/"}[1m])) / sum (machine_cpu_cores) * 100
If you want to see %CPU usage for a namespace you'll need to calculate namespace CPU usage first and than divide it by the available CPU in a cluster. It would look like this: sum (rate (container_cpu_usage_seconds_total{namespace="$Namespace"}[1m])) / sum(machine_cpu_cores) * 100
You can also use Prometheus' arbitrary labels in order to calculate CPU usage of a namespace. More details can be found here.
Finally you can try Prometheus exporter.
Please let me know if that helped.