On my GCE Kubernetes cluster I can no longer create pods.
Warning FailedScheduling pod (www.caveconditions.com-f1be467e31c7b00bc983fbe5efdbb8eb-438ef) failed to fit in any node
fit failure on node (gke-prod-cluster-default-pool-b39c7f0c-c0ug): Insufficient CPU
Looking at the allocated stats of that node
Non-terminated Pods: (8 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default dev.caveconditions.com-n80z8 100m (10%) 0 (0%) 0 (0%) 0 (0%)
default lamp-cnmrc 100m (10%) 0 (0%) 0 (0%) 0 (0%)
default mongo-2-h59ly 200m (20%) 0 (0%) 0 (0%) 0 (0%)
default www.caveconditions.com-tl7pa 100m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system fluentd-cloud-logging-gke-prod-cluster-default-pool-b39c7f0c-c0ug 100m (10%) 0 (0%) 200Mi (5%) 200Mi (5%)
kube-system kube-dns-v17-qp5la 110m (11%) 110m (11%) 120Mi (3%) 220Mi (5%)
kube-system kube-proxy-gke-prod-cluster-default-pool-b39c7f0c-c0ug 100m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system kubernetes-dashboard-v1.1.0-orphh 100m (10%) 100m (10%) 50Mi (1%) 50Mi (1%)
Allocated resources:
(Total limits may be over 100%, i.e., overcommitted. More info: http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
910m (91%) 210m (21%) 370Mi (9%) 470Mi (12%)
Sure I have 91% allocated and can not fit another 10% into it. But is it not possible to over commit resources?
The usage of the server is at about 10% CPU average
Would be a shame if I can not use more ressources.
I recently had this same issue, after some research I found that GKE has a default LimitRange
with CPU requests limit set to 100m
, this can be checked by running kubectl get limitrange -o=yaml
. It's going to display something like this:
apiVersion: v1
items:
- apiVersion: v1
kind: LimitRange
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"LimitRange","metadata":{"annotations":{},"name":"limits","namespace":"default"},"spec":{"limits":[{"defaultRequest":{"cpu":"100m"},"type":"Container"}]}}
creationTimestamp: 2017-11-16T12:15:40Z
name: limits
namespace: default
resourceVersion: "18741722"
selfLink: /api/v1/namespaces/default/limitranges/limits
uid: dcb25a24-cac7-11e7-a3d5-42010a8001b6
spec:
limits:
- defaultRequest:
cpu: 100m
type: Container
kind: List
metadata:
resourceVersion: ""
selfLink: ""
This limit is applied to every container. So for instance, if you have a 4 cores node, and assuming that for each POD of yours 2 containers are going to be created, it will allow only for around ~20 pods to be created.
The "fix" here is to change the default LimitRange
setting your own limits, and then removing old pods so they are recreated with the updated values, or to directly set the pods limits when creating them.
Some reading material:
https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests-and-limits
Yes, the overcommit is currently not supported. It's in planned improvements http://kubernetes.io/docs/user-guide/compute-resources. Related issue on github: https://github.com/kubernetes/kubernetes/issues/168
ps: in theory you can define custom node capacity, but I not sure.