Prometheus query for Kubernetes pod uptime

11/11/2021

I have a Kubernetes cluster with an install of kube-prometheus-stack (Prometheus 2.27.1, kube-state-metrics v2.0.0)

I would like to have a query to return how much time each pod was running, over last 24 hours

  • If a pod is still running, the time from its creation to now
  • If a post has terminated, the time from creation to completion

Importantly, I need exactly the time the pod existed, as opposed to CPU usage.

I can do something like this with:

kube_pod_completion_time - kube_pod_created

but it returns nothing for pods that are still running. And, since Prometheus does not return metrics that are more than 5 min old, it will not report anything for pods that were terminated and deleted.

How would I query Prometheus without these issues?

-- Vladimir Prus
kubernetes
monitoring
prometheus

1 Answer

11/19/2021

One working solution is this:

sum by(namespace, pod) (

   (last_over_time(kube_pod_completion_time[1d]) 
   - last_over_time(kube_pod_created[1d])) 

  or 

    (time() - kube_pod_created)

)

The first part inside sum handles the case of pods that have terminated. We pick the last value of kube_pod_completion_time and kube_pod_stared and compute the difference.

The second part handles the pods that are still running. In that case, there is a fresh value of the kube_pod_created metric, and we can subtract it from the current time.

-- Vladimir Prus
Source: StackOverflow