Node pool cluster is not autoscaling

11/11/2019

We've created a GKE cluster and set it into europe-west2 in zones A and B. The cluster is set as:

Number of nodes: 1 (2 in total) Autoscale: Yes (1-4 nodes per zone)

We are attempting to test the autoscaling and the cluster fails to schedule any pods and does not add any additional nodes.

W 2019-11-11T14:03:17Z unable to get metrics for resource cpu: no metrics returned from resource metrics API 
W 2019-11-11T14:03:20Z unable to get metrics for resource cpu: no metrics returned from resource metrics API 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:04:42Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:44Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:45Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:45Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:45Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:45Z 0/4 nodes are available: 4 Insufficient cpu. 
W 2019-11-11T14:04:51Z unable to get metrics for resource cpu: no metrics returned from resource metrics API 
I 2019-11-11T14:04:53Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:05:03Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:05:03Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:05:03Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T14:05:03Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached

We have about 80% of pods down as unschedulable and showing as in an error state. But we never see the cluster size increase (not physical nor horizontal).

We started with a 2 node setup and did a load test to take it to max. CPU got to 100% on both nodes and RAM got to 95% on both nodes. We got this error message:

I 2019-11-11T16:01:21Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T16:01:21Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T16:01:21Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
I 2019-11-11T16:01:21Z Ensuring load balancer 
W 2019-11-11T16:01:24Z Error creating load balancer (will retry): failed to ensure load balancer for service istio-system/istio-ingressgateway: failed to ensure a static IP for load balancer (a72c616b7f5cf11e9b4694201ac10480(istio-system/istio-ingressgateway)): error getting static IP address: googleapi: Error 404: The resource 'projects/gc-lotto-stage/regions/europe-west2/addresses/a72c616b7f5cf11e9b4694201ac10480' was not found, notFound 
W 2019-11-11T16:01:25Z missing request for cpu 
W 2019-11-11T16:01:25Z missing request for cpu 
W 2019-11-11T16:01:26Z missing request for cpu 
I 2019-11-11T16:01:31Z pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 max cluster cpu, memory limit reached 
W 2019-11-11T16:01:35Z missing request for cpu 
W 2019-11-11T16:01:44Z 0/2 nodes are available: 2 Insufficient cpu. 
W 2019-11-11T16:01:44Z 0/2 nodes are available: 2 Insufficient cpu. 
-- Lewis
google-kubernetes-engine
kubernetes

2 Answers

1/22/2020

I had the same issue for a while and after a lot of research and trail found that you have to keep few things in mind if you want to achieve cluster auto scaling in GKE.

  1. set resource request and limit for every workload possible

  2. auto scaling works on request not on limit. So if sum of all the requests of your workload is more than total resources available in the node pool only then you will see it scaling.

This did the trick for me.

Hope it helps.

-- shashank patel
Source: StackOverflow

1/22/2020

It depends on the configured node size as well:

Firstly look into the node allocatable resources:

Kubectl describe node <node>
Allocatable:
  cpu:                4
  ephemeral-storage:  17784772Ki
  hugepages-2Mi:      0
  memory:             4034816Ki
  pods:               110

Also check already allocated resources:

Allocated resources:
  Kubectl describe node <node>
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                1505m (37%)   3 (75%)
  memory             2750Mi (69%)  6484Mi (164%)
  ephemeral-storage  0 (0%)        0 (0%)

Then look into the resource requests:

if the CPU requests/memory requests are more than the node allocatable resources then the node might not autoscale. The nodes have sufficient capacity for the pod requests.

Ideally the allocatable resources are less than the actual capacity as the system would allocate fraction of the capacity to system daemons.

-- suresh Palemoni
Source: StackOverflow