Currently if we want to use Horizontal Pod autoscaling in kubernetes we need to specify following for the service we want to do :
Limits:
cpu: 150m
memory: 150Mi
Requests:
cpu: 42m
memory: 50Mi
I have some services which all can be scaled using HPA.
Can we over-provision these services? Like these services resource addition goes beyond the total resources available from the VM.
Consider a situation where : suppose the requests of the pods is within the total available CPU but the limit is beyond it
for example:
Total available CPU is 1000m cores, 2 pods with 500m cores requests each and limit 1000m each.
First thing can I set this limit like 1000m each if the total is only 1000m ?
If yes? Update2: < I think we can do it as I did an experiment shown below in image >
Now if pod 2 is not using its whole 500m cores of CPU and pod one has reached its total requested limit of 500m,
can this pod now use beyond 500m cores which are not utilized by the 2nd one as the limit is set to 1000 ?
If no? Update2: I guess this is not valid anymore
Then I guess over provisioning cannot be done ?
Let's start with a brief explanation of Autoscaling Algorithm:
Once per 30 seconds(--horizontal-pod-autoscaler-sync-period
default value), an autoscaler control loop queues pods and collects their CPU utilization. Then, it compares the arithmetic mean of this value to the configured threshold and adjusts the number of replicas to match the desired target of CPU utilization. CPU utilization is an average across last 1 minute CPU usage of a pod divided by the CPU requested by the pod. Currently, CPU usage is taken from Heapster service (should be present in kube-system
namespace).
At this part, there is no use for resource requests, resource limits, and pod affinity. We only got the desired number of replicas. Then Scheduler takes its part in autoscaling process and starts scheduling pods according to replicas number. At this moment, resource requests, resource limits, and pod affinity are taken into account to decide to which node next pod replica will be deployed.
According to the above mentioned, you can have several deployments that can't be scaled up to the max replicas number at the same period of time. But in case of insufficient amount of resources, first who scales up - consumes the resources, any other pods which don't fit into the remaining resources won’t be scheduled until resources become free again.
On GCP or GKE, you can use autoscaler to add new nodes to a cluster when you need more computing capacity and remove them when load comes down. That will help to avoid "overprovisioning" because you can always have the desired amount of computing capacity, not more, not less.
Update: Scheduler decides whether to run a pod based on available resources, default or configured limits set on the namespace, and pod affinity.
Limits work for each particular pod, limiting its resource consumption; they are not designed to limit summary resource consumption for several pods.
A pod is started with the amount of resources mentioned in the request.
For example, if you have 1000 CPU on the node and the pod requests 500 with limit 1000, the scheduler knows that other 500 are available even if the pod consumes all resources up to the limit at this moment. So, on the node with 1000 CPU available, you can have two pods started, with request 500 and limit 1000 each.