Kubernetes cluster in Google (GKE) is over scaling nodes

6/26/2018

I have a Kubernetes cluster in GCP running with several node pools with auto-scaling enabled. Seems like the auto-scaler is over scaling... Attached is the list of nodes and their usage (I'm using 32CPU machine type).

Can't understand why the auto-scaler doesn't merge some of these machines as the usage is far below the capacity and pods running on several machines can easily be merged into one.

I'm not using any special taints/affinities so I can't see why this over scaling is done. Reading the node auto-scaler documentation doesn't explain this type of behavior either.

Columns: status, requested CPU, Total CPU, requested RAM, Total RAM

nodes list

-- Idan
google-compute-engine
google-kubernetes-engine
kubernetes

1 Answer

7/19/2018

As you confirmed about kube-system pods, you might have some kube-system pods running in the nodes that are preventing the autoscaler from removing the nodes; please see this.

Regarding the logs of the cluster-autoscaler under GKE, unfortunately I don't think you have such access. If you have access to the master machine, check Cluster Autoscaler logs in /var/log/cluster-autoscaler.log. Cluster Autoscaler logs a lot of useful information, including why it considers a pod unremovable or what was its scale-up plan. For more info, you can look at this.

Please note that it is entirely possible that the node is underutilized, but the pod would not fit anywhere else and so that can be the reason the node can't be removed. The logic is documented here.

-- mehdi sharifi
Source: StackOverflow