How does Prometheus know when a pod crashed?

7/18/2018

I understand with Prometheus we can set up alerting rules which can detect and raise an alert if a pod crashes.

I want to understand how does Prometheus itself know when a pod crashed or is stuck in pending state.

  • Does it know this when it is trying to scrape metrics from pod's http endpoint port?

OR

  • Does Prometheus get the pod status information from Kubernetes?

The reason why I'm asking this is because I want to set up Prometheus to monitor existing pods that I have already deployed. I want to be alerted if a pod keeps crashing or if it is stuck in pending state. And I want to know if Prometheus can detect these alerts without making any modifications to the code inside the existing pods.

-- BlueChips23
kubernetes
kubernetes-pod
prometheus
prometheus-alertmanager

3 Answers

10/1/2019

kube-state-metrics gathers information from kube-apiserver for the state of kubernetes objects (such as pods, deployments, etc.). It is packed in prometheus-operator. To answer your question, you will not need the pod to be up to be able to scrape its status metrics, you will gather those directly from the apiserver (via scaping kube-state-metrics endpoint).

To check what pod level metrics are available to you via kube-state-metrics check: https://github.com/kubernetes/kube-state-metrics/blob/master/docs/pod-metrics.md

Per the answer above you can use kube_pod_container_status_waiting_reason metric or if you just want to alert on threshold regardless of the reason, you can use kube_pod_container_status_waiting

-- Christina A
Source: StackOverflow

10/1/2019

use sum(kube_pod_container_status_waiting_reason) by (reason) to get all the container waiting reasons if any

-- Kumail Haider
Source: StackOverflow

7/18/2018

The common way for prometheus to extract metrics and health is by the use of scraping (thru an http endpoint is the most common). Since pods can have multiple containers, it is best to scrape an http endpoint of your running container.

If prometheus didnt receive a good response from this endpoint, it can determine that the container is down.

Prometheus itself does not do alerting, you normally delegate that to the alert manager.

-- Bal Chua
Source: StackOverflow