Spiky kubernetes HPA with metric number of pubsub unacked messsages

11/22/2018

Currently we have a pipeline of data streaming: api call -> google pub/sub -> BigQuery. The number of api call will depend on the traffic on the website.

We create a kubernetes deployment (in GKE) for ingesting data from pub/sub to BigQuery. This deployment have a horizontal pod autoscaler (HPA) with with metricName: pubsub.googleapis.com|subscription|num_undelivered_messages and targetValue: "5000". This structure able to autoscale when the traffic have a sudden increase. However, it will cause a spiky scaling.

What I meant by spiky is as follows:

  1. The number of unacked messages will go up more than the target value
  2. The autoscaler will increase the number of pods
  3. Since the number of unacked will slowly decrease, but since it is still above target value the autoscaler will still increase the number of pods --> this happen until we hit the max number of pods in the autoscaler
  4. The number of unacked will decrease until it goes below target and it will stay very low
  5. The autoscaler will reduce the number of pods to the minimum number of pods
  6. The number of unacked messages will increase again and will go similar situation with (1) and it will go into a loop/cycle of spikes

Here are the chart when it goes spiky (the traffic is going up but it is stable and non-spiky): The spiky number of unacknowledged message in pub/sub

We set an alarm in stackdriver if the number of unacknowledged message is more than 20k, and in this situation it will always triggered frequently.

Is there a way so that the HPA become more stable (non-spiky) in this case?

Any comment, suggestion, or answer is well appreciated.

Thanks!

-- Yosua Michael
autoscaling
google-cloud-platform
google-cloud-pubsub
google-kubernetes-engine
kubernetes

1 Answer

5/29/2019

I've been dealing with the same behavior. What I ended up doing is smoothing the num_undelivered_messages using a moving average. I set up a k8s cron that publishes the average of the last 20 mins of time series data to a custom metric every minute. Then configured the HPA to respond to the custom metric.

This worked pretty good but not perfect. I observed that as soon as the average converges on the actual value, the HPA will scale the service down too low. So I ended up just adding a constant, so the custom metric is just average + constant. I found for my specific case a value of 25,000 worked well.

With this, and after dialing in the targetAverageValue, the autoscaling has been very stable.

I'm not sure if this is due to a defect or just the nature of the num_undelivered_messages metric at very high loads.

Edit: I used the stackdriver/monitoring golang packages. There is a straightforward way to aggregate the time series data; see here under 'Aggregating data' https://cloud.google.com/monitoring/custom-metrics/reading-metrics

https://cloud.google.com/monitoring/custom-metrics/creating-metrics

-- Ian Herbert
Source: StackOverflow