Resizing a google cloud Kubernetes cluster to zero not working

5/31/2018

I try to resize a kubernetes cluster to zero nodes using

gcloud container clusters resize $CLUSTER_NAME --size=0 --zone $ZONE

I get a success message but the size of the node-pool remains the same (I use only one node pool)

Is it possible to resize the cluster to zero?

-- gkatzioura
google-cloud-platform
kubernetes

1 Answer

6/1/2018

Sometimes you just need to wait 10-20 minutes before autoscale operation takes effect.
In other cases, you may need to check if some conditions are met for downscaling the node.

According to autoscaler documentation:

Cluster autoscaler also measures the usage of each node against the node pool's total demand for capacity. If a node has had no new Pods scheduled on it for a set period of time, and all Pods running on that node can be scheduled onto other nodes in the pool, the autoscaler moves the Pods and deletes the node.

Note that cluster autoscaler works based on Pod resource requests, that is, how many resources your Pods have requested. Cluster autoscaler does not take into account the resources your Pods are actively using. Essentially, cluster autoscaler trusts that the Pod resource requests you've provided are accurate and schedules Pods on nodes based on that assumption.

Note: Beginning with Kubernetes version 1.7, you can specify a minimum size of zero for your node pool. This allows your node pool to scale down completely if the instances within aren't required to run your workloads. However, while a node pool can scale to a zero size, the overall cluster size does not scale down to zero nodes (as at least one node is always required to run system Pods)

Cluster autoscaler has following limitations: - When scaling down, cluster autoscaler supports a graceful termination period for a Pod of up to 10 minutes. A Pod is always killed after a maximum of 10 minutes, even if the Pod is configured with a higher grace period.

Note: Every change you make to the cluster autoscaler causes the Kubernetes master to restart, which takes several minutes to complete.

However, there are cases mentioned in FAQ that can prevent CA from removing a node:

What types of pods can prevent CA from removing a node?

  • Pods with restrictive PodDisruptionBudget.
  • Kube-system pods that:
    • are not run on the node by default, *
    • don't have PDB or their PDB is too restrictive (since CA 0.6).
  • Pods that are not backed by a controller object (so not created by deployment, replica set, job, stateful set etc). *
  • Pods with local storage. *
  • Pods that cannot be moved elsewhere due to various constraints (lack of resources, non-matching node selectors or affinity, matching anti-affinity, etc) *Unless the pod has the following annotation (supported in CA 1.0.3 or later):

"cluster-autoscaler.kubernetes.io/safe-to-evict": "true"

How can I scale my cluster to just 1 node?

Prior to version 0.6, Cluster Autoscaler was not touching nodes that were running important kube-system pods like DNS, Heapster, > Dashboard etc. If these pods landed on different nodes, CA could not scale the cluster down and the user could end up with a completely empty 3 node cluster. In 0.6, we added an option to tell CA that some system pods can be moved around. If the user configures a PodDisruptionBudget for the kube-system pod, then the default strategy of not touching the node running this pod is overridden with PDB settings. So, to enable kube-system pods migration, one should set minAvailable to 0 (or <= N if there are N+1 pod replicas.) See also I have a couple of nodes with low utilization, but they are not scaled down. Why?

How can I scale a node group to 0?

From CA 0.6 for GCE/GKE and CA 0.6.1 for AWS, it is possible to scale a node group to 0 (and obviously from 0), assuming that all scale-down conditions are met.

For AWS, if you are using nodeSelector, you need to tag the ASG with a node-template key "k8s.io/cluster-autoscaler/node-template/label/".

For example, for a node label of foo=bar, you would tag the ASG with:

{ "ResourceType": "auto-scaling-group", "ResourceId": "foo.example.com", "PropagateAtLaunch": true, "Value": "bar", "Key": "k8s.io/cluster-autoscaler/node-template/label/foo" }

-- VAS
Source: StackOverflow