Kubernetes HPA - How to avoid scaling-up for CPU utilisation spike

1/15/2020

HPA - How to avoid scaling-up for CPU utilization spike (not on startup) When the business configuration is loaded for different country CPU load increases for 1min, but we want to avoid scaling-up for that 1min.

below pic, CurrentMetricValue is just current value from a matrix or an average value from the last poll to current poll duration --horizontal.-pod-autoscaler-sync-period

enter image description here

-- Sandip Jadhav
horizontal-pod-autoscaling
kubernetes

1 Answer

1/15/2020

The default HPA check interval is 30 seconds. This can be configured through the as you mentioned by changing value of flag --horizontal-pod-autoscaler-sync-period of the controller manager.

The Horizontal Pod Autoscaler is implemented as a control loop, with a period controlled by the controller manager’s --horizontal-pod-autoscaler-sync-period flag.

During each period, the controller manager queries the resource utilization against the metrics specified in each HorizontalPodAutoscaler definition. The controller manager obtains the metrics from either the resource metrics API (for per-pod resource metrics), or the custom metrics API (for all other metrics).

In order to change/add flags in kube-controller-manager - you should have access to your /etc/kubernetes/manifests/ directory on master node and be able to modify parameters in /etc/kubernetes/manifests/kube-controller-manager.yaml.

Note: you are not able do this on GKE, EKS and other managed clusters.

What is more I recommend increasing --horizontal-pod-autoscaler-downscale-stabilization (the replacement for --horizontal-pod-autoscaler-upscale-delay).

If you're worried about long outages I would recommend setting up a custom metric (1 if network was down in last ${duration}, 0 otherwise) and setting the target value of the metric to 1 (in addition to CPU-based autoscaling). This way:

If network was down in last ${duration} recommendation based on the custom metric will be equal to the current size of your deployment. Max of this recommendation and very low CPU recommendation will be equal to the current size of the deployment. There will be no scale downs until the connectivity is restored (+ a few minutes after that because of the scale down stabilization window).

If network is available recommendation based on the metric will be 0. Maxed with CPU recommendation it will be equal to the CPU recommendation and autoscaler will operate normally. I think this solves your issue better than limiting size of autoscaling step. Limiting size of autoscaling step will only slow down rate at which number of pods decreases so longer network outage will still result in your deployment shrinking to minimum allowed size.

You can also use memory based scaling

Since it is not possible to create memory-based hpa in Kubernetes, it has been written a script to achieve the same. You can find our script here by clicking on this link:

https://github.com/powerupcloud/kubernetes-1/blob/master/memory-based-autoscaling.sh

Clone the repository :

https://github.com/powerupcloud/kubernetes-1.git

and then go to the Kubernetes directory. Execute the help command to get the instructions:

./memory-based-autoscaling.sh --help

Read more here: memory-based-autoscaling.

-- MaggieO
Source: StackOverflow