Kubernetes HPA Scales Up Rapidly with Custom Metric

4/13/2021

I have a Spring Boot application running on GKE that takes ~7 minutes to be ready. I created an HPA, based on a custom Request per Second metric as following:

kind: "HorizontalPodAutoscaler"
metadata:
  name: X
  namespace: X
spec:
  maxReplicas: 10
  minReplicas: 3
  scaleTargetRef:
    apiVersion: "apps/v1"
    kind: "Deployment"
    name: "X"
  metrics:
    - type: "Pods"
      pods:
        metric:
          name: "istio_requests_per_second"
        target:
          type: "AverageValue"
          averageValue: 30

istio_requests_per_second metric already calculates the average RPS across available pods, which results in the same value per pod. For example, if there are 150 RPS in total and if there are 5 available pods, istio_requests_per_second will be 30.

When the istio_requests_per_second increases slightly above 30, HPA will keep spawning pods until one of the newly created pods is ready to receive a portion of the requests -- let's say 2 RPS in case the metric is increased to 32 RPS. That totally makes sense, because before the newly created pods are ready, they don't receive requests and HPA tries to keep the amount of RPS around the target value -- 30.

The problem is, I don't want HPA to spawn tens of pods in case the RPS slightly increases. For example, in case of 32 RPS, one new pod should be sufficient. I believe the main issue is the long startup time, because there is an autoscaling lag between the time of the scale-up decision and the time when pods become ready.

Because I'm running on GKE, I cannot change kube-controller-manager flags such as --horizontal-pod-autoscaler-sync-period.

I'm also running on Kubernetes 1.17, so the behavior field to configure the gradual scaling is out of question. Besides, I don't want to limit the scaling, it could be that the istio_requests_per_second is actually spiked above 100 RPS.

TL;DR: How do I configure Kubernetes HPA to not to spawn tens of pods in case of a slight increase in Request per Second for an application that starts slowly?

-- Hmerac
autoscaling
google-kubernetes-engine
horizontal-pod-autoscaling
kubernetes

1 Answer

4/21/2021

To narrow down the pod scalation options, you could base your autoscaling on multiple metrics.

You can see an example of how GKE does autoscale based on a custom or external metric in the documentation as well for have a better grasp on its utilization.

-- Nahuel
Source: StackOverflow