Kubernetes HPA Auto Scaling Velocity

10/24/2019

We have defined HPA for an application to have min 1 and max 4 replicas with 80% cpu as the threshold.

What we wanted was, if the pod cpu goes beyond 80%, the app needs to be scaled up 1 at a time. Instead what is happening is the application is getting scaled up to max number of replicas.

How can we define the scale velocity to scale 1 pod at a time. And again if one of the pod consumes more than 80% cpu then scale one more pod up but not maximum replicas.

Let me know how do we achieve this.

-- P Ekambaram
kubernetes
kubernetes-hpa

3 Answers

10/24/2019

This - https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details - expains the algorithm HPA uses, including the formula to calculate the number of "desired replicas".

If I recall, there were some (positive) changes to the HPA algo with v1.12.

-- apisim
Source: StackOverflow

10/24/2019

HPA has total control on scale up as of today. You can only fine tune scale down operation with the following parameter.

--horizontal-pod-autoscaler-downscale-stabilization

The good news is that there is a proposal for Configurable scale up/down velocity for HPA

-- Mesut
Source: StackOverflow

10/26/2019

First of all, the 80% CPU utilisation is not a threshold but a target value.

The HPA algorithm for calculating the desired number of replicas is based on the following formula:

X = N * (C/T)

Where:

  • X: desired number of replicas
  • N: current number of replicas
  • C: current value of the metric
  • T: target value for the metric

In other words, the algorithm aims at calculating a replica count that keeps the observed metric value as close as possible to the target value.

In your case, this means if the average CPU utilisation across the pods of your app is below 80%, the HPA tends to decrease the number of replicas (to make the CPU utilisation of the remaining pods go up). On the other hand, if the average CPU utilisation across the pods is above 80%, the HPA tends to increase the number of replicas, so that the CPU utilisation of the individual pods decreases.

The number of replicas that are added or removed in a single step depends on how far apart the current metric value is from the target value and on the current number of replicas. This decision is internal to the HPA algorithm and you can't directly influence it. The only contract that the HPA has with its users is to keep the metric value as close as possible to the target value.

If you need a very specific autoscaling behaviour, you can write a custom controller (or operator) to autoscale your application instead of using the HPA.

-- weibeld
Source: StackOverflow