I'm encountering a situation where pods are occasionally getting evicted after running out of memory. Is there any way to set up some kind of alerting where I can be notified when this happens?
As it is, Kubernetes keeps doing its job and re-creating pods after the old ones are removed, and it's often hours or days before I'm made aware that a problem exists at all.
GKE exports Kubernetes Events (kubectl get events
) to Stackdriver Logging, to the "GKE Cluster Operations" table:
Next, write a query specifically targeting evictions (the query I pasted below might not be accurate):
Then click "CREATE METRIC" button.
This will create a Log-based Metric. On the left sidebar, click "Logs-based metrics" and click the "Create alert from metric" option on the context menu of this metric:
Next, you'll be taken to Stackdriver Alerting portal. You can set up alerts there based on thresholds etc.