I have a Kubernetes cluster with an install of kube-prometheus-stack (Prometheus 2.27.1, kube-state-metrics v2.0.0)
I would like to have a query to return how much time each pod was running, over last 24 hours
Importantly, I need exactly the time the pod existed, as opposed to CPU usage.
I can do something like this with:
kube_pod_completion_time - kube_pod_created
but it returns nothing for pods that are still running. And, since Prometheus does not return metrics that are more than 5 min old, it will not report anything for pods that were terminated and deleted.
How would I query Prometheus without these issues?
One working solution is this:
sum by(namespace, pod) (
(last_over_time(kube_pod_completion_time[1d])
- last_over_time(kube_pod_created[1d]))
or
(time() - kube_pod_created)
)
The first part inside sum
handles the case of pods that have terminated. We pick the last value of kube_pod_completion_time
and kube_pod_stared
and compute the difference.
The second part handles the pods that are still running. In that case, there is a fresh value of the kube_pod_created
metric, and we can subtract it from the current time.