I have kuberentes cluster. One master and one worker. I install metric-server for auto scaling and then i run stress test
$ kubectl run autoscale-test --image=ubuntu:16.04 --requests=cpu=1000m --command sleep 1800
deployment "autoscale-test" created
$ kubectl autoscale deployment autoscale-test --cpu-percent=25 --min=1 --max=5
deployment "autoscale-test" autoscaled
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
autoscale-test Deployment/autoscale-test 0% / 25% 1 5 1 1m
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
autoscale-test-59d66dcbf7-9fqr8 1/1 Running 0 9m
kubectl exec autoscale-test-59d66dcbf7-9fqr8 -- apt-get update
kubectl exec autoscale-test-59d66dcbf7-9fqr8 -- apt-get install stress
$ kubectl exec autoscale-test-59d66dcbf7-9fqr8 -- stress --cpu 2 --timeout 600s &
stress: info: [227] dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd
everything works fine and the pod was auto scaled but after that the pod that was created by autoscale is still running and they do not terminate after the stress test the hpa shows that the 0% of cpu is in use but the 5 autoscaled pod still running
#kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
autoscale-test Deployment/autoscale-test 0%/25% 1 5 5 74m
#kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default autoscale-test-8f4d84bbf-7ddjw 1/1 Running 0 61m
default autoscale-test-8f4d84bbf-bmr59 1/1 Running 0 61m
default autoscale-test-8f4d84bbf-cxt26 1/1 Running 0 61m
default autoscale-test-8f4d84bbf-x9jws 1/1 Running 0 61m
default autoscale-test-8f4d84bbf-zbhvk 1/1 Running 0 71m
I wait for an hour but nothing happen
From the documentation:
--horizontal-pod-autoscaler-downscale-delay: The value for this option is a duration that specifies how long the autoscaler has to wait before another downscale operation can be performed after the current one has completed. The default value is 5 minutes (5m0s).
Note: When tuning these parameter values, a cluster operator should be aware of the possible consequences. If the delay (cooldown) value is set too long, there could be complaints that the Horizontal Pod Autoscaler is not responsive to workload changes. However, if the delay value is set too short, the scale of the replicas set may keep thrashing as usual.
Finally, just before HPA scales the target, the scale recommendation is recorded. The controller considers all recommendations within a configurable window choosing the highest recommendation from within that window. This value can be configured using the --horizontal-pod-autoscaler-downscale-stabilization-window flag, which defaults to 5 minutes. This means that scaledowns will occur gradually, smoothing out the impact of rapidly fluctuating metric values.