I have a kubernetes cluster for which a workload with Request : 1000m CPU and Limit 1200m CPU is deployed. Node template is 4-8096 and we never hit the RAM as workloads are more compute intensive.
My issue is as shown in the picture when the Auto-Scaler scaled the workload to 2 PODs, Kubernetes didn't scheduled the additional pod in the same node even though there is plenty of resources available. (2/3.92). Instead it had to schedule it on a new Node. This is a lot of waste and cost insensitive when we scale further up.
Is this normal behavior or what best practices can you recommend to achieve a better resource utilization?
Thanks. Nowa.
UPDATE:
After applying the autoscaling-profile to optimize-utilization as suggested by the Erhard Czving's answer, the additional pod was scheduled onto the same node. So now as per total requests that's 3/3.92 ~ 76%.
Try the optimize-utilization autoscaling profile.
It should keep the utilization much higher than the default profile, depending on the Kubernetes version used. Around 80% utilization is a good estimate.
Apply to a cluster with gcloud commands:
gcloud beta container clusters update example-cluster --autoscaling-profile optimize-utilization