GKE cluster not scaling down

8/6/2019

I have autoscaling enabled on Google Kubernetes Cluster and one of the pods I can see the usage is much lower

enter image description here

I have a total of 6 nodes and I expect at least this node to be terminated. I have gone through the following: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node

I have added this annotation to all my pods

cluster-autoscaler.kubernetes.io/safe-to-evict: true

However, the cluster autoscaler scales up correctly, but doesn't scale down as I expect it to.

I have the following logs

$ kubectl  logs kube-dns-autoscaler-76fcd5f658-mf85c -n kube-system

autoscaler/pkg/autoscaler/k8sclient/k8sclient.go:90: Failed to list *v1.Node: Get https://10.55.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.55.240.1:443: getsockopt: connection refused
E0628 20:34:36.187949       1 reflector.go:190] github.com/kubernetes-incubator/cluster-proportional-autoscaler/pkg/autoscaler/k8sclient/k8sclient.go:90: Failed to list *v1.Node: Get https://10.55.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.55.240.1:443: getsockopt: connection refused
E0628 20:34:47.191061       1 reflector.go:190] github.com/kubernetes-incubator/cluster-proportional-autoscaler/pkg/autoscaler/k8sclient/k8sclient.go:90: Failed to list *v1.Node: Get https://10.55.240.1:443/api/v1/nodes?resourceVersion=0: net/http: TLS handshake timeout
I0628 20:35:10.248636       1 autoscaler_server.go:133] ConfigMap not found: Get https://10.55.240.1:443/api/v1/namespaces/kube-system/configmaps/kube-dns-autoscaler: net/http: TLS handshake timeout, will create one with default params
E0628 20:35:17.356197       1 autoscaler_server.go:95] Error syncing configMap with apiserver: configmaps "kube-dns-autoscaler" already exists
E0628 20:35:18.191979       1 reflector.go:190] github.com/kubernetes-incubator/cluster-proportional-autoscaler/pkg/autoscaler/k8sclient/k8sclient.go:90: Failed to list *v1.Node: Get https://10.55.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.55.240.1:443: i/o timeout

I am not sure the above are the relevant logs, what is the correct way to debug this issue?

My pods have got local storage. I have been trying to debug this issue using

kubectl drain  gke-mynode-d57ded4e-k8tt

error: DaemonSet-managed pods (use --ignore-daemonsets to ignore): fluentd-gcp-v3.1.1-qzdzs, prometheus-to-sd-snqtn; pods with local storage (use --delete-local-data to override): mydocs-585879b4d5-g9flr, istio-ingressgateway-9b889644-v8bgq, mydocs-585879b4d5-7lmzk

I think it's safe to ignore daemonsets as CA should be ok to evict it, however I am not sure how to make the CA understand that mydocs is ok to be evicted and move to another node after adding the annotation

EDIT

The min and the max nodes have been set correctly as seen on the GCP console enter image description here

-- kosta
autoscaling
google-kubernetes-engine
kubernetes

1 Answer

8/6/2019

The kubectl logs command is for the DNS autoscaler, not the cluster autoscaler. It will give you information on the number of kube-dns replicas in the cluster, not the number of nodes or scaling decisions.

From the cluster autoscaler FAQ (and taking into account what you wrote in your question):

Kube-system pods that:

  • are not run on the node by default
  • Pods with local storage

And additionally, restrictive Pod Disruption Budgets. However since is not stated in the question, I'll assume you haven't set any.

Although you have pods with local storage, you added the annotation to make them safe to evict so that leaves the system pods not run by default in the nodes.

Since system pods in GKE are annotated with the reconciliation loop, you can't add this directive to them, which might be preventing their eviction.

In this scenario, you may consider using a Pod Disruption Budget configured to allow the autoscaler to evict them.

This Pod Disruption Budget can include DNS and logging pods that aren't run by default in the nodes.

Unfortunately, GKE is a managed option so there isn't much to apply from the autoscaler FAQ. However, if you want to go further, you might as well consider a pod binpacking strategy using Affinity and anti-affinity, Taints and tolerations and requests and limits to fit them properly, making the downscaling easier whenever possible.

Finally, on GKE you can use the cluster-autoscaler-status ConfigMap to check what decisions the autoscaler is making.

-- yyyyahir
Source: StackOverflow