I was reading the example at kubernetes hpa example. In this example they run with: kubectl run php-apache --image=k8s.gcr.io/hpa-example --requests=cpu=200m --expose --port=80
. So the pod will ask for 200m of cpu (0.2 of each core). After that they run hpa with a target cpu of 50%: kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
. Which mean that the desired milli-core is 200m * 0.5 = 100m. They make a load test and put up a 305% load. Which mean auto scale up to: ceil((3.05 * 200m) / 100m) = 7 pods according to: hpa scaling algorith.
This is all good, but we are experimenting with different values and I wonder if it's a good approach.
We opted for a target cpu of 500% (second option). For me, target cpu >= 100% is a wierd concept (maybe I understand wrong also, please correct me as I'm not that familiar with the whole concept), but it slow down scaling compare to the inverted (first option).
The first approach is correct.
The second one is not good for a few reasons:
Another very important thing. When planning your Horizontal Pod Autoscaler, consider total amount of resources available in your cluster so you don't find yourself in a situation when you're run out of resources.
Example: you have a system with 2-core processors which equals to having 2000 millicores available from perspective of your cluster. Let's say you decided to create following deployment:
kubectl run php-apache --image=k8s.gcr.io/hpa-example --requests=cpu=500m --expose --port=80
and then Horizontal Pod Autoscaler:
kubectl autoscale deployment php-apache --cpu-percent=100 --min=1 --max=5
This means you allow that more resources can be requested than you're actually have available in your cluster so in such situation 5th replica will never be created.