I'm trying to enable AWS EKS autoscaling based on a custom Cloudwatch metric via the Kubernetes Cloudwatch adapter. I have pushed custom metrics to AWS Cloudwatch, and validated they appear in Cloudwatch console as well as are retrievable using the boto3 client get_metric_data. This is the code I use to publish my custom metric to Cloudwatch:
import boto3
from datetime import datetime
client = boto3.client('cloudwatch')
cloudwatch_response = client.put_metric_data(
Namespace='TestMetricNS',
MetricData=[
{
'MetricName': 'TotalUnprocessed',
'Timestamp': datetime.now(),
'Value': 40,
'Unit': 'Megabytes',
}
]
)
I have the following yaml files for establishing the external metric and the hpa autoscaler in kubernetes:
extMetricCustom.yaml:
apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
name: test-custom-metric
spec:
name: test-custom-metric
resource:
resource: "deployment"
queries:
- id: sqs_test
metricStat:
metric:
namespace: "TestMetricNS"
metricName: "TotalUnprocessed"
period: 60
stat: Average
unit: Megabytes
returnData: true
hpaCustomMetric.yaml
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
name: test-scaler
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: sqs-consumer
minReplicas: 1
maxReplicas: 4
metrics:
- type: External
external:
metricName: test-custom-metric
targetAverageValue: 2
When I assess whether the Kubernetes Cloudwatch adapter is properly grabbing my custom metric (kubectl get hpa), it always displays that the metric is 0:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
test-scaler Deployment/sqs-consumer 0/2 (avg) 1 4 1 161m
How can I properly autoscale based off my Cloudwatch custom metric?
Worked with OP on this out-of-band and still had the tab open for this question later in the day, so posting the outcome here for posterity for anyone that stumbles upon it.
The root cause of the issue was a timezone conflict. The metrics monitor was based on "current" metrics, but the the following line from the metric generator script was producing time stamps without a timezone specified and also was in a local timezone.
'Timestamp': datetime.now(),
Since there was "no data" for the current timezone (only data X hours in the past due to a -X UTC offset), the system did not initiate scaling because there was a value of "0"/nil/null effectively. Instead, a UTC time string can be specified to ensure the generated metrics are timely:
'Timestamp': datetime.utcnow(),
A secondary consideration was that the Kubernetes Nodes need access to poll the metrics from CloudWatch. This is done by attaching this policy to the nodes's IAM role:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"cloudwatch:GetMetricData"
],
"Resource": "*"
}
]
}