Starting from Kubernetes v1.18 the v2beta2 API allows scaling behavior to be configured through the Horizontal Pod Autoscalar (HPA) behavior field. I'm planning to apply HPA with custom metrics to a StatefulSet.
The use case I'm looking at is scaling out using a custom metric (e.g. number of user sessions on my application), but the HPA will not scale down at all. This use case is also described by K8s SIG-Autoscaling enhancements - "Configurable scale velocity for HPA >> Story 4: Scale Up As Usual, Do Not Scale Down".
behavior:
scaleDown:
policies:
- type: pods
value: 0
The user sessions could stay active for minutes to hours. Starting with 1 replica of the StatefulSet, as the number of user sessions hit an upper limit (exposed using Prometheus collector and later configured using HPA custom metric option), the application pods will scale-out. The new pods will start serving new users.
Since this is a StatefulSet and cannot just abruptly scale down, I'm seeking help on ways to scale down when the user sessions on the new replicas go down to 0. The above link says that the scale down can be controlled by a separate process. Not sure how to do this? Looking for some pointers.
Thanks.
You can use periodSeconds
and stabilizationWindowSeconds
values to manage how much time will pass between termination of pods, for example:
behavior:
scaleDown:
stabilizationWindowSeconds: 10
policies:
- type: Pods
value: 1
periodSeconds: 20
This way it will scale down 1 pod every ~30 seconds (or whatever value will be used in periodSeconds
and stabilizationWindowSeconds
). Time may vary depending on stabilizationWindowSeconds
values over time.
periodSeconds
describes how much time will pass between termination of each pod, maximum value is 1800 second (30 minutes).
stabilizationWindowSeconds
when metrics indicate that target should be scaled down, this algorithm takes a look into previously calculated desired states and uses highest value from specified interval. For scale down default value is 300, maximum value is 3600 (one hour).