Kubernetes hpa with external metric. My external metric is not returning correct value

1/7/2022

I want to scale my worker pods using HPA based on the total number of outstanding messages across all AWS SQS queues. Since there is no such metric available, I created a custom metric using lambda function. I am using k8s-cloudwatch-adapter. https://aws.amazon.com/blogs/compute/scaling-kubernetes-deployments-with-amazon-cloudwatch-metrics/

I've tested my lambda function. It returns the correct value and the metric also gets pushed to cloudwatch.My cloudwatch adapter is able to register the external metric as well. I verified it using the command :

$ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq.

Just for some reason it returns null value rather than returning the correct value. There are no issues with cloudwatch-adapter permissions. HPA doesn't throw any error. Just that it shows value as "0" when it should return "15" in my case.

I think it's because of some wrong queries that I'm providing in my external metric manifest. This is how all my files look like.(Not including cloudwatch adapter manifest files)

Lambda:

import boto3
def lambda_handler(event, context):
    client = boto3.client('sqs')

    listOfQueues = client.list_queues(
        QueueNamePrefix='test'
    )

    listOfQueues = listOfQueues["QueueUrls"]
    #print(listOfQueues)

    numberOfQueues= len(listOfQueues)
    print("Total number of queues: %s" %(numberOfQueues))

    totalOutstandingMessages=0

    for i in range(0, numberOfQueues):
        messages = client.get_queue_attributes(
            QueueUrl=listOfQueues[i],
            AttributeNames=[
                'ApproximateNumberOfMessages',
            ]
        )
        messages= messages["Attributes"]["ApproximateNumberOfMessages"]
        totalOutstandingMessages=totalOutstandingMessages+int(messages)
    print("Total number of Outsanding Messages: %s" %(totalOutstandingMessages))

    cloudwatch = boto3.client('cloudwatch')

    response = cloudwatch.put_metric_data(
        Namespace='CustomSQSMetrics',
        MetricData=[
            {
                'MetricName': 'OutstandingMessagesTest',
                'Dimensions': [
                    {
                        'Name': 'TotalOutStandingMessages',
                        'Value': 'OutStandingMessages'
                    },
                ],
                'Values': [
                    totalOutstandingMessages,
                ],
            },
        ]
    )
    print(response)

External metric manifest:

kind: ExternalMetric
metadata:
  name: outstanding-messages
spec:
  name: outstanding-messages
  resource:
    resource: "deployment"
  queries:
    - id: sqs_helloworld
      metricStat:
        metric:
          namespace: "CustomSQSMetrics"
          metricName: "OutstandingMessagesTest"
          dimensions:
            - name: TotalOutStandingMessages
              value: "OutStandingMessages"
        period: 300
        stat: Maximum
        unit: Count
      returnData: true

HPA:

apiVersion: autoscaling/v2beta1
metadata:
  name: workers-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: workers
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metricName: outstanding-messages
      targetValue: 12
-- Aman Deep
amazon-cloudwatch
amazon-sqs
amazon-web-services
kubernetes
kubernetes-hpa

1 Answer

1/7/2022

This got resolved. It was because metric data was getting pushed to cloudwatch only when I was deploying/testing my lambda manually. Hence when the external metric was trying to get the value, in that particular moment, it was receiving a null value. I added the cron job to my lambda so that it runs every minute. Post which data is being pushed to cloudwatch every minute and is available to be picked up by external metric all the time. After doing this external metric was able to get the data and Hpa was able to scale my pods.

-- Aman Deep
Source: StackOverflow