For some reason Kubernetes 1.6.2 does not trigger autoscaling on Google Container Engine.
I have a someservice
definition with the following resources and rolling update:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: someservice
labels:
layer: backend
spec:
minReadySeconds: 160
replicas: 1
strategy:
rollingUpdate:
maxSurge: 100%
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
labels:
name: someservice
layer: backend
spec:
containers:
- name: someservice
image: eu.gcr.io/XXXXXX/someservice:v1
imagePullPolicy: Always
resources:
limits:
cpu: 2
memory: 20Gi
requests:
cpu: 400m
memory: 18Gi
<.....>
After changing image version, the new instance cannot start:
$ kubectl -n dev get pods -l name=someservice
NAME READY STATUS RESTARTS AGE
someservice-2595684989-h8c5d 0/1 Pending 0 42m
someservice-804061866-f2trc 1/1 Running 0 1h
$ kubectl -n dev describe pod someservice-2595684989-h8c5d
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
43m 43m 4 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (4), Insufficient memory (3).
43m 42m 6 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (3), Insufficient memory (3).
41m 41m 2 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (2), Insufficient memory (3).
40m 36s 136 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (3).
43m 2s 243 cluster-autoscaler Normal NotTriggerScaleUp pod didn't trigger scale-up (it wouldn't fit if a new node is added)
My node pool is set to autoscale with min: 2
, max: 5
. And machines (n1-highmem-8
) in node pool are large enough (52GB) to accommodate this service. But somehow nothing happens:
$ kubectl get nodes
NAME STATUS AGE VERSION
gke-dev-default-pool-efca0068-4qq1 Ready 2d v1.6.2
gke-dev-default-pool-efca0068-597s Ready 2d v1.6.2
gke-dev-default-pool-efca0068-6srl Ready 2d v1.6.2
gke-dev-default-pool-efca0068-hb1z Ready 2d v1.6.2
$ kubectl describe nodes | grep -A 4 'Allocated resources'
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
7060m (88%) 15510m (193%) 39238591744 (71%) 48582818048 (88%)
--
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
6330m (79%) 22200m (277%) 48930Mi (93%) 66344Mi (126%)
--
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
7360m (92%) 13200m (165%) 49046Mi (93%) 44518Mi (85%)
--
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
7988m (99%) 11538m (144%) 32967256Ki (61%) 21690968Ki (40%)
$ gcloud container node-pools describe default-pool --cluster=dev
autoscaling:
enabled: true
maxNodeCount: 5
minNodeCount: 2
config:
diskSizeGb: 100
imageType: COS
machineType: n1-highmem-8
oauthScopes:
- https://www.googleapis.com/auth/compute
- https://www.googleapis.com/auth/datastore
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/devstorage.read_write
- https://www.googleapis.com/auth/service.management.readonly
- https://www.googleapis.com/auth/servicecontrol
- https://www.googleapis.com/auth/sqlservice
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
serviceAccount: default
initialNodeCount: 2
instanceGroupUrls:
- https://www.googleapis.com/compute/v1/projects/XXXXXX/zones/europe-west1-b/instanceGroupManagers/gke-dev-default-pool-efca0068-grp
management:
autoRepair: true
name: default-pool
selfLink: https://container.googleapis.com/v1/projects/XXXXXX/zones/europe-west1-b/clusters/dev/nodePools/default-pool
status: RUNNING
version: 1.6.2
$ kubectl -n dev get pods -l name=someservice
NAME READY STATUS RESTARTS AGE
someservice-2595684989-h8c5d 0/1 Pending 0 42m
someservice-804061866-f2trc 1/1 Running 0 1h
$ kubectl -n dev describe pod someservice-2595684989-h8c5d
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
43m 43m 4 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (4), Insufficient memory (3).
43m 42m 6 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (3), Insufficient memory (3).
41m 41m 2 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (2), Insufficient memory (3).
40m 36s 136 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (3).
43m 2s 243 cluster-autoscaler Normal NotTriggerScaleUp pod didn't trigger scale-up (it wouldn't fit if a new node is added)
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:33:11Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:22:08Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Make sure instance group autoscaler either is disabled or has proper minimum/maximum number of instances settings.
According to Kubernetes Cluster Autoscaler FAQ:
CPU-based (or any metric-based) cluster/node group autoscalers, like GCE Instance Group Autoscaler, are NOT compatible with [Kubernetes Cluster Austoscaler]. They are also not particularly suited to use with Kubernetes in general.
...so it should probably be disabled.
Try:
gcloud compute instance-groups managed describe gke-dev-default-pool-efca0068-grp \
--zone europe-west1-b
Then check out autoscaler
property. It will be absent if instance group autoscaler is disabled.
To disable it, do:
gcloud compute instance-groups managed stop-autoscaling gke-dev-default-pool-efca0068-grp \
--zone europe-west1-b
So it seems that this is a bug with Kubernetes 1.6.2. According to GKE support engineer:
From the messages "No nodes are available that match all of the following predicates", this seems to be a known issue and the engineers managed to track down the root cause. It was an issue in cluster autoscaler version 0.5.1 that is currently used in GKE 1.6 (up to 1.6.2). This issue had been fixed already in cluster autoscaler 0.5.2, which is included in head for the 1.6 branch.