In our Kuberenetes cluster, we are running into sporadic situations where a cluster node runs out of memory and Linux invokes OOM killer. Looking at the logs, it appears that the Pods scheduled onto the Node are requesting more memory than can be allocated by the Node.
The issue is that, when OOM killer is invoked, it prints out a list of processes and their memory usage. However, as all of our Docker containers are Java services, the "process name" just appears as "java", not allowing us to track down which particular Pod is causing the issues.
How can I get the history of which Pods were scheduled to run on a particular Node and when?
I guess your pods don't have requests and limits set, or the values are not ideal.
If you setup this properly, when a pod starts to use too much ram, that pod will be killed and you will be able to find out what is causing the issues.
About seeing all the pods on a node, you can go with kubectl get events
or docker ps -a
on the node, as cited on the other answers/comments.
You can now use kube-state-metrics kube_pod_container_status_terminated_reason
to detect OOM events
kube_pod_container_status_terminated_reason{reason="OOMKilled"}
kube_pod_container_status_terminated_reason{container="addon-resizer",endpoint="http-metrics",instance="100.125.128.3:8080",job="kube-state-metrics",namespace="monitoring",pod="kube-state-metrics-569ffcff95-t929d",reason="OOMKilled",service="kube-state-metrics"}
We use Prometheus to monitor OOM events.
This expression should report the number of times that memory usage has reached the limits:
rate(container_memory_failcnt{pod_name!=""}[5m]) > 0
FYI: this is the next best thing to proper docs, the code
One way is to see docker ps -a
output and correlate the container names with your pod's containers.