I was wondering if the auto scaling feature in kubernetes is a reactive approach or proactive approach and if they are rule based only
please let me know
Thank you
It entirely depends on how you define reactive and proactive. From one hand I would say reactive as the metrics, autoscaling decision is based upon, need to reach a certain value so the autoscaling process can take place. An action is proactive only when it is based on prediction or anticipation of certain events e.g. load increase e.g. you are expecting that due to promotion campaing you're launching next week, the load on your app will increase about 3 times.
I would encourage you to take a closer look at autoscaling algorithm details.
From the most basic perspective, the Horizontal Pod Autoscaler controller operates on the ratio between desired metric value and current metric value:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
For example, if the current metric value is
200m
, and the desired value is100m
, the number of replicas will be doubled, since200.0 / 100.0 == 2.0
If the current value is instead50m
, we'll halve the number of replicas, since50.0 / 100.0 == 0.5
. We'll skip scaling if the ratio is sufficiently close to 1.0 (within a globally-configurable tolerance, from the--horizontal-pod-autoscaler-tolerance
flag, which defaults to 0.1).
As you can see the desired number of replicas is calculated based on a current metric value and only if this value reaches the certain critical point, the autoscaling process is triggered. So it's 100% reactive from this perspective.
Looking at the problem from different point of view, the decision about using horizontal pod autoscaler is kind of proactive approach. But now I'm talking about user's approach to managing their infrastructure, not about the mechanism of the hpa itself as I described above. Suppose you don't use horizontal pod autoscaler at all and you have sometimes unexpected load increases on your rigidly fixed set of pods that youre application is running on and due to those increases your application often becomes unavailable.
If you administer such environment manually, your reaction in such a situation is the decision about scaling your Deployment
out. You will probably agree with me that this is totally reactive approach.
However if you decide to use hpa, you proactively anticipate the occurance of such load increases. It gives you the possibility of being always one step ahead and react automatically before the situation occurs. So if you decide to scale out your Deployment
when the CPU usage reaches the certain treshhold e.g. 50% (still safe for the application so it continues running), the hpa automatically handles the situation for you, based on your predictions. However the horizontal pod autoscaler's reaction is reactive (reaction on exceeded treshhold), at the same time infrastructure autoscaling in such moment is proactive, as the autoscaler steps into action before the situation becomes critical.
I hope this has shed some light on your understanding of the problem.