How to get max cpu useage of a pod in kubernetes over a time interval (say 30 days) in promql?

11/7/2019

I am trying to estimate the resource (cpu) request and limit values, for which I want to know the max cpu usage of a pod in last one month using prometheus.

I checked this question but couldn't get what i want Generating range vectors from return values in Prometheus queries

I tried this but it seems max_over_time doesn't worker over rate

max (  
  max_over_time(
    rate(
      container_cpu_usage_seconds_total[5m]
    )[30d]
  )
) by (pod_name)

invalid parameter 'query': parse error at char 64: range specification must be preceded by a metric selector, but follows a *promql.Call instead

-- mad_boy
kubernetes
prometheus
prometheus-operator
promql

2 Answers

1/14/2020

Please, try something like this:

max_over_time(sum(rate(container_cpu_usage_seconds_total{pod="pod-name-here-759b8f",container_name!="POD", container_name!=""}[1m])) [720h:1s])

-- Vinicius Peres
Source: StackOverflow

11/10/2019

You'd need to capture the inner expression (rate of container cpu usage) as a recording rule:

- record: container_cpu_usage_seconds_total:rate5m
  expr: rate(container_cpu_usage_seconds_total[5m])

then use this new timeseries to calculate max_over_time:

max (  
  max_over_time(container_cpu_usage_seconds_total:rate5m[30d])
) by (pod_name)

This is only need in Prometheus versions older to 2.7 as subqueries can be calculated on the fly, see this blog post for more details.

Bear in mind though, if you're planning to use this composite query (max of max_per_time of data collected in the last 30 days) for alerting or visualisation (rather than an one-off query), then you'd still want to use the recording rule to improve the query's performance. Its the classic CS computational complexity tradeoff (memory/storage space required for storing the recording rule as a separate timeseries vs. the computational resources needed to process data for 30 days!)

-- ekarak
Source: StackOverflow