Metric Absence Alert on Logs-based Metrics from Pod Triggered on Pod Reschedule

9/20/2017

Setup

Note: Using pseudo-code instance notation: ObjectType("<name>", | <attr>: <attr-value>]).

We have a Container: Container("k8s-snapshots") in a Pod("k8s-snapshots-0") in a `StatefulSet("k8s-snapshots", spec.replicas: 1)

We expect at most 1 Pod to run at any point in time.

We have a Logs-based Counter Metric("k8s-snapshots/snapshot-created") with the filter:

resource.type="container"
resource.labels.cluster_name="my-cluster"
logName="projects/my-project/logs/k8s-snapshots"
jsonPayload.event:"snapshot.created"

We have a Stackdriver Policy:

Policy(
  Name: "snapshot metric absent",
  Condition: Condition(
    Metric("k8s-snapshots/snapshot-created"),
    is absent for: "more than 30 minutes"
  )
)

In order to monitor if Container("k8s-snapshots") has stopped creating snapshots.

Expected result

An alert is triggered if no instance of Pod("k8s-snapshots-0") has logged any event matching Metric("k8s-snapshots/snapshot-created").

Result

Policy(Name: "snapshot metric absent") is violated each time Pod("k8s-snapshots-0") is rescheduled.

It seems like a sub-metric of the main logs-based metric is created for each instance of Pod("k8s-snapshots"), and Stackdriver alerts for each sub-metric.

-- joar
google-cloud-monitoring
google-cloud-platform
google-cloud-stackdriver
google-kubernetes-engine
stackdriver

1 Answer

1/19/2018

Are you still experiencing the issue? WithStackdriver API you have the ability to aggregate metrics (You can have custom metrics) which the UI does not have until now. You can also visit this link

-- KarthickN
Source: StackOverflow