I am trying to set up auto-provisioning on Google's Kubernetes service GKE. I created a cluster with both auto-scaling and auto-provisioning like so:
gcloud beta container clusters create "some-name" --zone "us-central1-a" \
--no-enable-basic-auth --cluster-version "1.13.11-gke.14" \
--machine-type "n1-standard-1" --image-type "COS" \
--disk-type "pd-standard" --disk-size "100" \
--metadata disable-legacy-endpoints=true \
--scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \
--num-nodes "1" --enable-stackdriver-kubernetes --enable-ip-alias \
--network "projects/default-project/global/networks/default" \
--subnetwork "projects/default-project/regions/us-central1/subnetworks/default" \
--default-max-pods-per-node "110" \
--enable-autoscaling --min-nodes "0" --max-nodes "8" \
--addons HorizontalPodAutoscaling,KubernetesDashboard \
--enable-autoupgrade --enable-autorepair \
--enable-autoprovisioning --min-cpu 1 --max-cpu 8 --min-memory 1 --max-memory 16
The cluster has 1 node pool with 1 node having 1 vCPU. I tried running a deployment which requests 4 vCPU, so it would clearly not be satisfied by the current node pool.
kubectl run say-lol --image ubuntu:18.04 --requests cpu=4 -- bash -c 'echo lolol'
Here is what I want to happen: The auto-scaler should fail to accommodate the new deployment, as the existing node pool doesn't have enough CPU. The auto-provisioner should try to create a new node pool with a new node of 4 vCPU to run the new deployment.
Here is what is happening: The auto-scaler fails as expected. But the auto-provisioner is doing nothing. The pod remains Pending
indefinitely. No new node pools get created.
$ kubectl get events
LAST SEEN TYPE REASON KIND MESSAGE
50s Warning FailedScheduling Pod 0/1 nodes are available: 1 Insufficient cpu.
4m7s Normal NotTriggerScaleUp Pod pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 Insufficient cpu
9m17s Normal SuccessfulCreate ReplicaSet Created pod: say-lol-5598b4f6dc-vz58k
9m17s Normal ScalingReplicaSet Deployment Scaled up replica set say-lol-5598b4f6dc to 1
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
say-lol-5598b4f6dc-vz58k 0/1 Pending 0 9m14s
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-some-name-default-pool-4ec86782-bv5t Ready <none> 31m v1.13.11-gke.14
Why isn't a new node pool getting created to run the new deployment?
EDIT: It seems the cpu=4
is the problematic part. If I change to cpu=1.5
, it works. A new node pool is created and the pods start running. However, I indicated --max-cpu 8
so it should clearly be able to handle 4 vCPUs.
Issue could be related to allocatable CPU. Please check the machine type that was created.
Specifying this --max-cpu 8
does not mean that new node will have 8 cores. Instead it specifies the maximum number of cores in the cluster.
Changing to --max-cpu 40
should give better results as it will allow for a bigger machine type to be created.