Kubernetes cluster autoscaler does not seem to work on GKE?

6/15/2018

I've got a node pool defined with min instances set to 1 and max instances set to 5, and autoscaling enabled.

However it does not seem to be scaling down.

  • I have cordoned a node.
  • It has been over 12 hours
  • There are no pending pods
  • Removing a node would not reduce the amount of replicas of my own deployment

The node in question has the following pods running on it:

  • fluentd
  • kube-dns
  • kube-proxy-gke
  • metrics-server
  • redis

All the pods above are in the kube-system namespace besides the redis pod which is defined within a daemonset.

Is there any additional configuration required? A pod disruption budget perhaps?

Output of kubectl describe -n kube-system configmap cluster-autoscaler-status:

Name:         cluster-autoscaler-status
Namespace:    kube-system
Labels:       <none>
Annotations:  cluster-autoscaler.kubernetes.io/last-updated=2018-06-15 10:40:16.289611397 +0000 UTC

Data
====
status:
----
Cluster-autoscaler status at 2018-06-15 10:40:16.289611397 +0000 UTC:
Cluster-wide:
  Health:      Healthy (ready=4 unready=0 notStarted=0 longNotStarted=0 registered=4 longUnregistered=0)
               LastProbeTime:      2018-06-15 10:40:14.942263061 +0000 UTC
               LastTransitionTime: 2018-06-15 09:17:56.845900388 +0000 UTC
  ScaleUp:     NoActivity (ready=4 registered=4)
               LastProbeTime:      2018-06-15 10:40:14.942263061 +0000 UTC
               LastTransitionTime: 2018-06-15 09:18:55.777577792 +0000 UTC
  ScaleDown:   NoCandidates (candidates=0)
               LastProbeTime:      2018-06-15 10:40:14.942263061 +0000 UTC
               LastTransitionTime: 2018-06-15 09:39:03.33504599 +0000 UTC

NodeGroups:
  Name:        https://content.googleapis.com/compute/v1/projects/gcpwp-ayurved-subs-staging/zones/europe-west1-b/instanceGroups/gke-wordpress-preempt-nodes-9c33afcb-grp
  Health:      Healthy (ready=3 unready=0 notStarted=0 longNotStarted=0 registered=3 longUnregistered=0 cloudProviderTarget=3 (minSize=2, maxSize=3))
               LastProbeTime:      2018-06-15 10:40:14.942263061 +0000 UTC
               LastTransitionTime: 2018-06-15 09:17:56.845900388 +0000 UTC
  ScaleUp:     NoActivity (ready=3 cloudProviderTarget=3)
               LastProbeTime:      2018-06-15 10:40:14.942263061 +0000 UTC
               LastTransitionTime: 2018-06-15 09:18:55.777577792 +0000 UTC
  ScaleDown:   NoCandidates (candidates=0)
               LastProbeTime:      2018-06-15 10:40:14.942263061 +0000 UTC
               LastTransitionTime: 2018-06-15 09:39:03.33504599 +0000 UTC


Events:  <none>
-- Chris Stryczynski
google-kubernetes-engine
kubernetes

2 Answers

7/30/2019

Also as stated in GKE FAQ, a node will not be downscaled until the sum of cpu and memory requests of all pods running on this node is smaller than 50% of the node's allocatable. See here for a duplicate question.

-- Aleksi
Source: StackOverflow

6/18/2018

There are a few constraints that could prevent the node from scaling down.

You should verify the pods you listed one by one against the What types of pods can prevent CA from removing a node? documentation. This should help you discover if there is a pod that prevents it.

If it is indeed the redis pod then you could try using the safe to evict annotation:

"cluster-autoscaler.kubernetes.io/safe-to-evict": "true"

If it is one of the system pods I would try the same thing on other nodes to see if scaling down works on them. According to the GKE documentation, you should be able to scale down your cluster to 1 node per cluster or completely for a specific node pool.

-- Sergey Bahchissaraitsev
Source: StackOverflow