I want to scale my worker pods using HPA based on the total number of outstanding messages across all AWS SQS queues. Since there is no such metric available, I created a custom metric using lambda function. I am using k8s-cloudwatch-adapter. https://aws.amazon.com/blogs/compute/scaling-kubernetes-deployments-with-amazon-cloudwatch-metrics/
I've tested my lambda function. It returns the correct value and the metric also gets pushed to cloudwatch.My cloudwatch adapter is able to register the external metric as well. I verified it using the command :
$ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq.
Just for some reason it returns null value rather than returning the correct value. There are no issues with cloudwatch-adapter permissions. HPA doesn't throw any error. Just that it shows value as "0" when it should return "15" in my case.
I think it's because of some wrong queries that I'm providing in my external metric manifest. This is how all my files look like.(Not including cloudwatch adapter manifest files)
Lambda:
import boto3
def lambda_handler(event, context):
client = boto3.client('sqs')
listOfQueues = client.list_queues(
QueueNamePrefix='test'
)
listOfQueues = listOfQueues["QueueUrls"]
#print(listOfQueues)
numberOfQueues= len(listOfQueues)
print("Total number of queues: %s" %(numberOfQueues))
totalOutstandingMessages=0
for i in range(0, numberOfQueues):
messages = client.get_queue_attributes(
QueueUrl=listOfQueues[i],
AttributeNames=[
'ApproximateNumberOfMessages',
]
)
messages= messages["Attributes"]["ApproximateNumberOfMessages"]
totalOutstandingMessages=totalOutstandingMessages+int(messages)
print("Total number of Outsanding Messages: %s" %(totalOutstandingMessages))
cloudwatch = boto3.client('cloudwatch')
response = cloudwatch.put_metric_data(
Namespace='CustomSQSMetrics',
MetricData=[
{
'MetricName': 'OutstandingMessagesTest',
'Dimensions': [
{
'Name': 'TotalOutStandingMessages',
'Value': 'OutStandingMessages'
},
],
'Values': [
totalOutstandingMessages,
],
},
]
)
print(response)
External metric manifest:
kind: ExternalMetric
metadata:
name: outstanding-messages
spec:
name: outstanding-messages
resource:
resource: "deployment"
queries:
- id: sqs_helloworld
metricStat:
metric:
namespace: "CustomSQSMetrics"
metricName: "OutstandingMessagesTest"
dimensions:
- name: TotalOutStandingMessages
value: "OutStandingMessages"
period: 300
stat: Maximum
unit: Count
returnData: true
HPA:
apiVersion: autoscaling/v2beta1
metadata:
name: workers-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: workers
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metricName: outstanding-messages
targetValue: 12
This got resolved. It was because metric data was getting pushed to cloudwatch only when I was deploying/testing my lambda manually. Hence when the external metric was trying to get the value, in that particular moment, it was receiving a null value. I added the cron job to my lambda so that it runs every minute. Post which data is being pushed to cloudwatch every minute and is available to be picked up by external metric all the time. After doing this external metric was able to get the data and Hpa was able to scale my pods.