I have a GKE cluster and I'd like to keep track of the ratio between the total memory requested and the total memory allocatable. I was able to create a chart in Google Cloud Monitoring using
metric.type="kubernetes.io/container/memory/request_bytes" resource.type="k8s_container"
and
metric.type="kubernetes.io/node/memory/allocatable_bytes" resource.type="k8s_node"
both with crossSeriesReducer
set to REDUCE_SUM
in order to get the aggregate total across the cluster.
Then, when I tried to set up an alerting policy (using the cloud monitoring api) with the ratio of the two (following this), I get this error
ERROR: (gcloud.alpha.monitoring.policies.create) INVALID_ARGUMENT: The numerator and denominator must have the same resource type.
It doesn't like that the first metric is a k8s_container
and the second metric is a k8s_node
Are there different metrics I can use or some sort of workaround in order to alert on memory request/allocatable ratio in Google Cloud Monitoring?
EDIT:
Here is the full request and response
$ gcloud alpha monitoring policies create --policy-from-file=policy.json
ERROR: (gcloud.alpha.monitoring.policies.create) INVALID_ARGUMENT: The numerator and denominator must have the same resource type.
$ cat policy.json
{
"displayName": "Cluster Memory",
"enabled": true,
"combiner": "OR",
"conditions": [
{
"displayName": "Ratio: Memory Requests / Memory Allocatable",
"conditionThreshold": {
"filter": "metric.type=\"kubernetes.io/container/memory/request_bytes\" resource.type=\"k8s_container\"",
"aggregations": [
{
"alignmentPeriod": "60s",
"crossSeriesReducer": "REDUCE_SUM",
"groupByFields": [
],
"perSeriesAligner": "ALIGN_MEAN"
}
],
"denominatorFilter": "metric.type=\"kubernetes.io/node/memory/allocatable_bytes\" resource.type=\"k8s_node\"",
"denominatorAggregations": [
{
"alignmentPeriod": "60s",
"crossSeriesReducer": "REDUCE_SUM",
"groupByFields": [
],
"perSeriesAligner": "ALIGN_MEAN",
}
],
"comparison": "COMPARISON_GT",
"thresholdValue": 0.8,
"duration": "60s",
"trigger": {
"count": 1
}
}
}
]
}
ERROR: (gcloud.alpha.monitoring.policies.create) INVALID_ARGUMENT: The numerator and denominator must have the same resource type.
Following official documentation:
groupByFields[] - parameter
The set of fields to preserve when
crossSeriesReducer
is specified. ThegroupByFields
determine how the time series are partitioned into subsets prior to applying the aggregation operation. Each subset contains time series that have the same value for each of the grouping fields. Each individual time series is a member of exactly one subset. ThecrossSeriesReducer
is applied to each subset of time series. It is not possible to reduce across different resource types, so this field implicitly containsresource.type
. Fields not specified ingroupByFields
are aggregated away. IfgroupByFields
is not specified and all the time series have the same resource type, then the time series are aggregated into a single output time series. IfcrossSeriesReducer
is not defined, this field is ignored.
Please take specific look on part:
It is not possible to reduce across different resource types, so this field implicitly contains
resource.type
.
Above error shows when you try to create a policy with a different resource types.
Metrics shown below have Resource type
of:
kubernetes.io/container/memory/request_bytes
- k8s_container
kubernetes.io/node/memory/allocatable_bytes
- k8s_node
You can check the Resource type
by looking at the metric in the GCP Monitoring
:
As a workaround you could try to create an alert policy which will alert you when allocatable utilization of memory is above 85%. It will indirectly tell you that requested memory is high enough to trigger an alarm.
Example below with YAML:
combiner: OR
conditions:
- conditionThreshold:
aggregations:
- alignmentPeriod: 60s
crossSeriesReducer: REDUCE_SUM
groupByFields:
- resource.label.cluster_name
perSeriesAligner: ALIGN_MEAN
comparison: COMPARISON_GT
duration: 60s
filter: metric.type="kubernetes.io/node/memory/allocatable_utilization" resource.type="k8s_node"
resource.label."cluster_name"="GKE-CLUSTER-NAME"
thresholdValue: 0.85
trigger:
count: 1
displayName: Memory allocatable utilization for GKE-CLUSTER-NAME by label.cluster_name
[SUM]
name: projects/XX-YY-ZZ/alertPolicies/AAA/conditions/BBB
creationRecord:
mutateTime: '2020-03-31T08:29:21.443831070Z'
mutatedBy: XXX@YYY.com
displayName: alerting-policy-when-allocatable-memory-is-above-85
enabled: true
mutationRecord:
mutateTime: '2020-03-31T08:29:21.443831070Z'
mutatedBy: XXX@YYY.com
name: projects/XX-YY-ZZ/alertPolicies/
Example with GCP Monitoring web access
:
Please let me know if you have any questions in that.
To properly create alert policies which will show relevant data you need to take a lot of factors into consideration like:
For more advanced alert policy which will take into consideration the allocatable memory per node pool you can do something like that:
combiner: OR
conditions:
- conditionThreshold:
aggregations:
- alignmentPeriod: 60s
crossSeriesReducer: REDUCE_SUM
groupByFields:
- metadata.user_labels."cloud.google.com/gke-nodepool"
perSeriesAligner: ALIGN_MEAN
comparison: COMPARISON_GT
duration: 60s
filter: metric.type="kubernetes.io/node/memory/allocatable_utilization" resource.type="k8s_node"
resource.label."cluster_name"="CLUSTER_NAME"
thresholdValue: 0.85
trigger:
count: 1
displayName: Memory allocatable utilization (filtered) (grouped) [SUM]
creationRecord:
mutateTime: '2020-03-31T18:03:20.325259198Z'
mutatedBy: XXX@YYY.ZZZ
displayName: allocatable-memory-per-node-pool-above-85
enabled: true
mutationRecord:
mutateTime: '2020-03-31T18:18:57.169590414Z'
mutatedBy: XXX@YYY.ZZZ
Please be aware that there is a bug: Groups.google.com: Google Stackdriver discussion and the only possibility to create above alert policy is with command line.