I have a kubernetes deployment in which I am trying to run 5 docker containers inside a single pod on a single node. The containers hang in "Pending" state and are never scheduled. I do not mind running more than 1 pod but I'd like to keep the number of nodes down. I have assumed 1 node with 1 CPU and 1.7G RAM will be enough for the 5 containers and I have attempted to split the workload across.
Initially I came to the conclusion that I have insufficient resources. I enabled autoscaling of nodes which produced the following (see kubectl describe pod command):
pod didn't trigger scale-up (it wouldn't fit if a new node is added)
Anyway, each docker container has a simple command which runs a fairly simple app. Ideally I wouldn't like to have to deal with setting CPU and RAM allocation of resources but even setting the CPU/mem limits within bounds so they don't add up to > 1, I still get (see kubectl describe po/test-529945953-gh6cl) I get this:
No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (1).
Below are various commands that show the state. Any help on what I'm doing wrong will be appreciated.
kubectl get all
user_s@testing-11111:~/gce$ kubectl get all
NAME READY STATUS RESTARTS AGE
po/test-529945953-gh6cl 0/5 Pending 0 34m
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/kubernetes 10.7.240.1 <none> 443/TCP 19d
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/test 1 1 1 0 34m
NAME DESIRED CURRENT READY AGE
rs/test-529945953 1 1 0 34m
user_s@testing-11111:~/gce$
kubectl describe po/test-529945953-gh6cl
user_s@testing-11111:~/gce$ kubectl describe po/test-529945953-gh6cl
Name: test-529945953-gh6cl
Namespace: default
Node: <none>
Labels: app=test
pod-template-hash=529945953
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"test-529945953","uid":"c6e889cb-a2a0-11e7-ac18-42010a9a001a"...
Status: Pending
IP:
Created By: ReplicaSet/test-529945953
Controlled By: ReplicaSet/test-529945953
Containers:
container-test2-tickers:
Image: gcr.io/testing-11111/testology:latest
Port: <none>
Command:
process_cmd
arg1
test2
Limits:
cpu: 150m
memory: 375Mi
Requests:
cpu: 100m
memory: 375Mi
Environment:
DB_HOST: 127.0.0.1:5432
DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
container-kraken-tickers:
Image: gcr.io/testing-11111/testology:latest
Port: <none>
Command:
process_cmd
arg1
arg2
Limits:
cpu: 150m
memory: 375Mi
Requests:
cpu: 100m
memory: 375Mi
Environment:
DB_HOST: 127.0.0.1:5432
DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
container-gdax-tickers:
Image: gcr.io/testing-11111/testology:latest
Port: <none>
Command:
process_cmd
arg1
arg2
Limits:
cpu: 150m
memory: 375Mi
Requests:
cpu: 100m
memory: 375Mi
Environment:
DB_HOST: 127.0.0.1:5432
DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
container-bittrex-tickers:
Image: gcr.io/testing-11111/testology:latest
Port: <none>
Command:
process_cmd
arg1
arg2
Limits:
cpu: 150m
memory: 375Mi
Requests:
cpu: 100m
memory: 375Mi
Environment:
DB_HOST: 127.0.0.1:5432
DB_PASSWORD: <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
DB_USER: <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
cloudsql-proxy:
Image: gcr.io/cloudsql-docker/gce-proxy:1.09
Port: <none>
Command:
/cloud_sql_proxy
--dir=/cloudsql
-instances=testing-11111:europe-west2:testology=tcp:5432
-credential_file=/secrets/cloudsql/credentials.json
Limits:
cpu: 150m
memory: 375Mi
Requests:
cpu: 100m
memory: 375Mi
Environment: <none>
Mounts:
/cloudsql from cloudsql (rw)
/etc/ssl/certs from ssl-certs (rw)
/secrets/cloudsql from cloudsql-instance-credentials (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
cloudsql-instance-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: cloudsql-instance-credentials
Optional: false
ssl-certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs
cloudsql:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-b2mxc:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-b2mxc
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
27m 17m 44 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (2).
26m 8s 150 cluster-autoscaler Normal NotTriggerScaleUp pod didn't trigger scale-up (it wouldn't fit if a new node is added)
16m 2s 63 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (1).
user_s@testing-11111:~/gce$
> Blockquote
kubectl get nodes
user_s@testing-11111:~/gce$ kubectl get nodes
NAME STATUS AGE VERSION
gke-test-default-pool-abdf83f7-p4zw Ready 9h v1.6.7
kubectl get pods
user_s@testing-11111:~/gce$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test-529945953-gh6cl 0/5 Pending 0 38m
kubectl describe nodes
user_s@testing-11111:~/gce$ kubectl describe nodes
Name: gke-test-default-pool-abdf83f7-p4zw
Role:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/fluentd-ds-ready=true
beta.kubernetes.io/instance-type=g1-small
beta.kubernetes.io/os=linux
cloud.google.com/gke-nodepool=default-pool
failure-domain.beta.kubernetes.io/region=europe-west2
failure-domain.beta.kubernetes.io/zone=europe-west2-c
kubernetes.io/hostname=gke-test-default-pool-abdf83f7-p4zw
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Tue, 26 Sep 2017 02:05:45 +0100
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Tue, 26 Sep 2017 02:06:05 +0100 Tue, 26 Sep 2017 02:06:05 +0100 RouteCreated RouteController created a route
OutOfDisk False Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Tue, 26 Sep 2017 11:33:57 +0100 Tue, 26 Sep 2017 02:06:05 +0100 KubeletReady kubelet is posting ready status. AppArmor enabled
KernelDeadlock False Tue, 26 Sep 2017 11:33:12 +0100 Tue, 26 Sep 2017 02:05:45 +0100 KernelHasNoDeadlock kernel has no deadlock
Addresses:
InternalIP: 10.154.0.2
ExternalIP: 35.197.217.1
Hostname: gke-test-default-pool-abdf83f7-p4zw
Capacity:
cpu: 1
memory: 1742968Ki
pods: 110
Allocatable:
cpu: 1
memory: 1742968Ki
pods: 110
System Info:
Machine ID: e6119abf844c564193495c64fd9bd341
System UUID: E6119ABF-844C-5641-9349-5C64FD9BD341
Boot ID: 1c2f2ea0-1f5b-4c90-9e14-d1d9d7b75221
Kernel Version: 4.4.52+
OS Image: Container-Optimized OS from Google
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.11.2
Kubelet Version: v1.6.7
Kube-Proxy Version: v1.6.7
PodCIDR: 10.4.1.0/24
ExternalID: 6073438913956157854
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system fluentd-gcp-v2.0-k565g 100m (10%) 0 (0%) 200Mi (11%) 300Mi (17%)
kube-system heapster-v1.3.0-3440173064-1ztvw 138m (13%) 138m (13%) 301456Ki (17%) 301456Ki (17%)
kube-system kube-dns-1829567597-gdz52 260m (26%) 0 (0%) 110Mi (6%) 170Mi (9%)
kube-system kube-dns-autoscaler-2501648610-7q9dd 20m (2%) 0 (0%) 10Mi (0%) 0 (0%)
kube-system kube-proxy-gke-test-default-pool-abdf83f7-p4zw 100m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system kubernetes-dashboard-490794276-25hmn 100m (10%) 100m (10%) 50Mi (2%) 50Mi (2%)
kube-system l7-default-backend-3574702981-flqck 10m (1%) 10m (1%) 20Mi (1%) 20Mi (1%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
728m (72%) 248m (24%) 700816Ki (40%) 854416Ki (49%)
Events: <none>
As you can see in the output of your kubectl describe nodes
command under Allocated resources:
, there is 728m (72%)
CPU and 700816Ki (40%)
Memory already requested by Pods running in the kube-system
namespace on the node. The sum of resource requests of your test Pod both exceeds the remaining CPU and Memory available on your node, as you can see under Events
of your kubectl describe po/[…]
command.
If you want to keep all containers in a single pod, you need to reduce the resource requests of your containers or run them on a node with more CPU and Memory. The better solution would be to split your application in multiple pods, this enables distribution over multiple nodes.