kubernetes: Node not scaled down by Cluster Autoscaler despite low usage

9/23/2019

Here is the status of one of my nodes in terms of allocations (based on requests)

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests    Limits
  --------                   --------    ------
  cpu                        374m (4%)   3151m (39%)
  memory                     493Mi (1%)  1939Mi (7%)
  ephemeral-storage          0 (0%)      0 (0%)
  attachable-volumes-gce-pd  0           0

Despite the low usage I would expect it to be scaled down by the Cluster Autoscaler (enabled).

However it is not.

Here are the pods running

Non-terminated Pods:         (7 in total)
  Namespace                  Name                                                              CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                                              ------------  ----------  ---------------  -------------  ---
  extra-services             external-dns-cfd4bb858-fvpfj                                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         149m
  istio-system               istio-galley-65987fccb-prxk6                                      10m (0%)      0 (0%)      0 (0%)           0 (0%)         121m
  istio-system               istio-policy-76ddd9fc97-pkxhh                                     110m (1%)     2 (25%)     128Mi (0%)       1Gi (3%)       149m
  kube-system                fluentd-gcp-v3.2.0-7mndl                                          100m (1%)     1 (12%)     200Mi (0%)       500Mi (1%)     5h20m
  kube-system                kube-proxy-gke-my-node-name   100m (1%)     0 (0%)      0 (0%)           0 (0%)         5h20m
  kube-system                metrics-server-v0.3.1-8675cc4d57-xg9qt                            53m (0%)      148m (1%)   145Mi (0%)       395Mi (1%)     120m
  kube-system                prometheus-to-sd-n2jfq                                            1m (0%)       3m (0%)     20Mi (0%)        20Mi (0%)      5h20m

and here are my daemonsets:

k get ds --all-namespaces
NAMESPACE     NAME                       DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                  AGE
kube-system   fluentd-gcp-v3.2.0         14        14        14      14           14          beta.kubernetes.io/fluentd-ds-ready=true       226d
kube-system   metadata-proxy-v0.1        0         0         0       0            0           beta.kubernetes.io/metadata-proxy-ready=true   226d
kube-system   nvidia-gpu-device-plugin   0         0         0       0            0           <none>                                         226d
kube-system   prometheus-to-sd           14        14        14      14           14          beta.kubernetes.io/os=linux                    159d

Why isn't the node scaling down?

edit: This is what I get when trying to drain the node manually:

cannot delete Pods with local storage (use --delete-local-data to override): istio-system/istio-policy-76ddd9fc97-pkxhh
-- pkaramol
google-kubernetes-engine
kubernetes

1 Answer

9/23/2019

Node Autoscaling is based on scheduling, the scheduler will attempt to schedule a pod on a node, if all nodes are not available it will scale up the cluster and schedule on the new pod, the autoscaler will only scale down once no new pods are scheduled on that node, i.e its from from any scheduled pods after x amount of time. you can find out more about this here

-- Spazzy757
Source: StackOverflow