Note: Using pseudo-code instance notation:
ObjectType("<name>", | <attr>: <attr-value>])
.
We have a Container: Container("k8s-snapshots")
in a Pod("k8s-snapshots-0")
in a `StatefulSet("k8s-snapshots", spec.replicas: 1)
We expect at most 1 Pod to run at any point in time.
We have a Logs-based Counter
Metric("k8s-snapshots/snapshot-created")
with the filter:
resource.type="container"
resource.labels.cluster_name="my-cluster"
logName="projects/my-project/logs/k8s-snapshots"
jsonPayload.event:"snapshot.created"
We have a Stackdriver Policy:
Policy(
Name: "snapshot metric absent",
Condition: Condition(
Metric("k8s-snapshots/snapshot-created"),
is absent for: "more than 30 minutes"
)
)
In order to monitor if Container("k8s-snapshots")
has stopped creating snapshots.
An alert is triggered if no instance of Pod("k8s-snapshots-0")
has logged any event matching Metric("k8s-snapshots/snapshot-created")
.
Policy(Name: "snapshot metric absent")
is violated each time Pod("k8s-snapshots-0")
is rescheduled.
It seems like a sub-metric of the main logs-based metric is created for each instance of Pod("k8s-snapshots")
, and Stackdriver alerts for each sub-metric.
Are you still experiencing the issue? WithStackdriver API you have the ability to aggregate metrics (You can have custom metrics) which the UI does not have until now. You can also visit this link