How to make Horizontal Pod Autoscaler scale down pod replicas on a percentage decrease threshold?

8/16/2019

I am looking for a syntax/condition of percentage decrease threshold to be inserted in HPA.yaml file which would allow the Horizontal Pod Autoscaler to start decreasing the pod replicas when the CPU utilization falls that particular percentage threshold.

Consider this scenario:- I mentioned an option targetCPUUtilizationPercentage and assigned it with value 50. minReplicas to be 1 and MaxReplicas to be 5. Now lets assume the CPU utilization went above 50, and went till 100, making the HPA to create 2 replicas. If the utilization decreases to 51% also, HPA will not terminate 1 pod replica.

Is there any way to conditionize the scale down on the basis of % decrease in CPU utilization?

Just like targetCPUUtilizationPercentage, I could be able to mention targetCPUUtilizationPercentageDecrease and assign it value 30, so that when the CPU utilization falls from 100% to 70%, HPA terminates a pod replica and further 30% decrease in CPU utilization, so that when it reaches 40%, the other remaining pod replica gets terminated.

-- Prabhat Nagpal
devops
kubernetes
kubernetes-pod

1 Answer

8/21/2019

As per on-line resources, this topic is still under community progress "Configurable HorizontalPodAutoscaler options"

I didn't try but as workaround you can try to create custom metrics f.e. using Prometheus Adapter, Horizontal pod auto scaling by using custom metrics in order to have more control about provided limits.

At the moment you can use horizontal-pod-autoscaler-downscale-stabilization:

--horizontal-pod-autoscaler-downscale-stabilization option to control

The value for this option is a duration that specifies how long the autoscaler has to wait before another downscale operation can be performed after the current one has completed. The default value is 5 minutes (5m0s).

On the other point of view this is expected due to the basis of HPA:

Applications that process very important data events. These should scale up as fast as possible (to reduce the data processing time), and scale down as soon as possible (to reduce cost).

Hope this help.

-- Hanx
Source: StackOverflow