I understand with Prometheus we can set up alerting rules which can detect and raise an alert if a pod crashes.
I want to understand how does Prometheus itself know when a pod crashed or is stuck in pending state.
OR
The reason why I'm asking this is because I want to set up Prometheus to monitor existing pods that I have already deployed. I want to be alerted if a pod keeps crashing or if it is stuck in pending state. And I want to know if Prometheus can detect these alerts without making any modifications to the code inside the existing pods.
kube-state-metrics
gathers information from kube-apiserver
for the state of kubernetes objects (such as pods, deployments, etc.). It is packed in prometheus-operator
. To answer your question, you will not need the pod to be up to be able to scrape its status metrics, you will gather those directly from the apiserver (via scaping kube-state-metrics endpoint).
To check what pod level metrics are available to you via kube-state-metrics check: https://github.com/kubernetes/kube-state-metrics/blob/master/docs/pod-metrics.md
Per the answer above you can use kube_pod_container_status_waiting_reason
metric or if you just want to alert on threshold regardless of the reason, you can use kube_pod_container_status_waiting
use sum(kube_pod_container_status_waiting_reason) by (reason)
to get all the container waiting reasons if any
The common way for prometheus to extract metrics and health is by the use of scraping (thru an http endpoint is the most common). Since pods can have multiple containers, it is best to scrape an http endpoint of your running container.
If prometheus didnt receive a good response from this endpoint, it can determine that the container is down.
Prometheus itself does not do alerting, you normally delegate that to the alert manager.