I have a stackdriver log based metric tracking GKE pod restarts.
I'd like to alert via email if the number of alerts breaches a predefined threshold.
I'm unsure as what thresholds I need to set inroder to trigger the alert via stackdriver. I have three pods via deployed service.
GKE is already sending to Stackdriver a metric called: container/restart_count
. You just need to create an alert policy as described on Managing alerting policies. As per the official doc, this metric expose:
Number of times the container has restarted. Sampled every 60 seconds.
You should use the Logs Viewer and create a filter:
As a resource you should choose GKE Cluster Operations
and add a filter.
Filter might look like this:
resource.type="k8s_cluster"
resource.labels.cluster_name="<CLUSTER_NAME>"
resource.labels.location="<CLUSTR_LOCATION>"
jsonPayload.reason="Killing"
After that create a custom metric by clicking on Create metric
button.
Then you can Create alert from metric
by clicking on created metric in Logs-based metrics
.
Then setting up a Configuration for triggers and conditions and threshold.
As for the correct Threshold, I would take the average amount of restarts from past time period and make it a bit more for alerting.