kubernetes hpa request cpu and target cpu values

7/19/2019

I was reading the example at kubernetes hpa example. In this example they run with: kubectl run php-apache --image=k8s.gcr.io/hpa-example --requests=cpu=200m --expose --port=80. So the pod will ask for 200m of cpu (0.2 of each core). After that they run hpa with a target cpu of 50%: kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10. Which mean that the desired milli-core is 200m * 0.5 = 100m. They make a load test and put up a 305% load. Which mean auto scale up to: ceil((3.05 * 200m) / 100m) = 7 pods according to: hpa scaling algorith.

This is all good, but we are experimenting with different values and I wonder if it's a good approach.

2 options

We opted for a target cpu of 500% (second option). For me, target cpu >= 100% is a wierd concept (maybe I understand wrong also, please correct me as I'm not that familiar with the whole concept), but it slow down scaling compare to the inverted (first option).

-- ThePainnn
autoscaling
kubernetes

1 Answer

7/31/2019

The first approach is correct.

The second one is not good for a few reasons:

  1. Decision about necessity of scaling up the cluster is taken too late, when first Pod is already overloaded. If you give only 100 millicores of CPU to one Pod, but you allow a situation it can use 5 times of what is available, before the decision about scaling up the cluster can be taken. Such system isn't very efficient with load average about 5 per core which means that when 1 process is served in a given time, there are another 4 processes waiting for CPU time.
  2. Same with scaling down the cluster. It isn't very effective either. Let's say your general CPU usage in your cluster decreased by more than 400 millicores but it is still not enough to remove one replica and scale down the cluster. In first case scenario 4 replicas would be already removed and the cluster scaled down.

Another very important thing. When planning your Horizontal Pod Autoscaler, consider total amount of resources available in your cluster so you don't find yourself in a situation when you're run out of resources.

Example: you have a system with 2-core processors which equals to having 2000 millicores available from perspective of your cluster. Let's say you decided to create following deployment:

kubectl run php-apache --image=k8s.gcr.io/hpa-example --requests=cpu=500m --expose --port=80

and then Horizontal Pod Autoscaler:

kubectl autoscale deployment php-apache --cpu-percent=100 --min=1 --max=5

This means you allow that more resources can be requested than you're actually have available in your cluster so in such situation 5th replica will never be created.

-- mario
Source: StackOverflow