Auto join in prometheus with max over time with no output

1/14/2020

prometheus:v2.15.2 kubernetes:v1.14.9

I have a query where it shows exactly the maximum over time during the set period. But I would like to join with the metric already set in the kube_pod_container resource.

I would like to know if what is set is close to the percentage set or not, displaying the percentage.

I have other examples working with this same structure of metric

jvm_memory_bytes_used{instance="url.instance.com.br"} / jvm_memory_bytes_max{area="heap"} * 100 > 80

but this one is not working.

max_over_time(sum(rate(container_cpu_usage_seconds_total{pod="pod-name-here",container_name!="POD", container_name!=""}[1m])) [1h:1s]) / kube_pod_container_resource_requests_cpu_cores * 100 < 70

Well the first idea was to create a query to collect the maximum historical cpu usage of a container in a pod in a brief period:

max_over_time(sum(rate(container_cpu_usage_seconds_total{pod="xpto-92838241",container_name!="POD", container_name!=""}[1m])) [1h:1s])

Element: {} Value: 0.25781324101515

If we execute it this way:

container_cpu_usage_seconds_total{pod="xpto-92838241",container_name!="POD", container_name!=""}

Element: container_cpu_usage_seconds_total{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_instance_type="t3.small",beta_kubernetes_io_os="linux",cluster="teste.k8s.xpto",container="xpto",container_name="xpto",cpu="total",failure_domain_beta_kubernetes_io_region="sa-east-1",failure_domain_beta_kubernetes_io_zone="sa-east-1c",generic="true",id="/kubepods/burstable/poda9999e9999e999e9-/99999e9999999e9",image="nginx",instance="kubestate-dev.internal.xpto",job="kubernetes-cadvisor",kops_k8s_io_instancegroup="nodes",kubernetes_io_arch="amd64",kubernetes_io_hostname="ip-99-999-9-99.sa-east-1.compute.internal",kubernetes_io_os="linux",kubernetes_io_role="node",name="k8s_nginx_nginx-99999e9999999e9",namespace="nmpc",pod="pod-92838241",pod_name="pod-92838241",spot="false"} Value: 22533.2

Now we have what is configured:

kube_pod_container_resource_requests_cpu_cores{pod="xpto-92838241"}

Element: kube_pod_container_resource_requests_cpu_cores{container="xpto",instance="kubestate-dev.internal.xpto",job="k8s-http",namespace="nmpc",node="ip-99-999-999-99.sa-east-1.compute.internal",pod="pod-92838241"} Value: 1

Well, in my perception it would be to use these two metrics and get it close to the percentage like this:

max_over_time(sum(rate(container_cpu_usage_seconds_total{pod="xpto-dev-92838241",container_name!="POD", container_name!=""}[1m])) [1h:1s]) / kube_pod_container_resource_requests_cpu_cores * 100 < 70

Element: no data Value:

But these two metrics do not interact, I can not understand why and do not find in the documentation.

Regards

-- Vinicius Peres
kubernetes
prometheus
prometheus-operator
promql

2 Answers

1/15/2020

As you can see here, only in Kubernetes 1.16 cadvisor metric labels pod_name and container_name were removed and substituted by pod and container respectively. As you are using Kubernetes 1.14, you should still use pod_name and container_name.

Let me know if it helps.

-- mario
Source: StackOverflow

1/16/2020

Here's Prometheus Operators, with the documentation and this blog about CPU aggregation walkthrough.

I got the solution of my problem with vector matching.

max_over_time(sum(rate(container_cpu_usage_seconds_total{pod="pod-name-here",container_name!="POD", container_name!=""}[1m])) [1h:1s]) / on(pod_name) group_left(container_name) kube_pod_container_resource_requests_cpu_cores{pod="pod-name-here"}

thank you all

-- Vinicius Peres
Source: StackOverflow