i set up a kubernets cluster with three nodes, i get all my nodes status ready, but the scheduler seems not find one of them. How could this happen.
[root@master1 app]# kubectl get nodes
NAME LABELS STATUS AGE
172.16.0.44 kubernetes.io/hostname=172.16.0.44,pxc=node1 Ready 8d
172.16.0.45 kubernetes.io/hostname=172.16.0.45 Ready 8d
172.16.0.46 kubernetes.io/hostname=172.16.0.46 Ready 8d
I use nodeSelect in my RC file like thie:
nodeSelector:
pxc: node1
describe the rc:
Name: mongo-controller
Namespace: kube-system
Image(s): mongo
Selector: k8s-app=mongo
Labels: k8s-app=mongo
Replicas: 1 current / 1 desired
Pods Status: 0 Running / 1 Waiting / 0 Succeeded / 0 Failed
Volumes:
mongo-persistent-storage:
Type: HostPath (bare host directory volume)
Path: /k8s/mongodb
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
25m 25m 1 {replication-controller } SuccessfulCreate Created pod: mongo-controller-0wpwu
get pods to be pending:
[root@master1 app]# kubectl get pods mongo-controller-0wpwu --namespace=kube-system
NAME READY STATUS RESTARTS AGE
mongo-controller-0wpwu 0/1 Pending 0 27m
describe pod mongo-controller-0wpwu:
[root@master1 app]# kubectl describe pod mongo-controller-0wpwu --namespace=kube-system
Name: mongo-controller-0wpwu
Namespace: kube-system
Image(s): mongo
Node: /
Labels: k8s-app=mongo
Status: Pending
Reason:
Message:
IP:
Replication Controllers: mongo-controller (1/1 replicas created)
Containers:
mongo:
Container ID:
Image: mongo
Image ID:
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Waiting
Ready: False
Restart Count: 0
Environment Variables:
Volumes:
mongo-persistent-storage:
Type: HostPath (bare host directory volume)
Path: /k8s/mongodb
default-token-7qjcu:
Type: Secret (a secret that should populate this volume)
SecretName: default-token-7qjcu
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
22m 37s 12 {default-scheduler } FailedScheduling pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.46): MatchNodeSelector
fit failure on node (172.16.0.45): MatchNodeSelector
27m 9s 67 {default-scheduler } FailedScheduling pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.45): MatchNodeSelector
fit failure on node (172.16.0.46): MatchNodeSelector
See the ip list in events, The 172.16.0.44 seems not seen by the scheduler? How could the happen?
describe the node 172.16.0.44
[root@master1 app]# kubectl describe nodes --namespace=kube-system
Name: 172.16.0.44
Labels: kubernetes.io/hostname=172.16.0.44,pxc=node1
CreationTimestamp: Wed, 30 Mar 2016 15:58:47 +0800
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
──── ────── ───────────────── ────────────────── ────── ───────
Ready True Fri, 08 Apr 2016 12:18:01 +0800 Fri, 08 Apr 2016 11:18:52 +0800 KubeletReady kubelet is posting ready status
OutOfDisk Unknown Wed, 30 Mar 2016 15:58:47 +0800 Thu, 07 Apr 2016 17:38:50 +0800 NodeStatusNeverUpdated Kubelet never posted node status.
Addresses: 172.16.0.44,172.16.0.44
Capacity:
cpu: 2
memory: 7748948Ki
pods: 40
System Info:
Machine ID: 45461f76679f48ee96e95da6cc798cc8
System UUID: 2B850D4F-953C-4C20-B182-66E17D5F6461
Boot ID: 40d2cd8d-2e46-4fef-92e1-5fba60f57965
Kernel Version: 3.10.0-123.9.3.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Container Runtime Version: docker://1.10.1
Kubelet Version: v1.2.0
Kube-Proxy Version: v1.2.0
ExternalID: 172.16.0.44
Non-terminated Pods: (1 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
───────── ──── ──────────── ────────── ─────────────── ─────────────
kube-system kube-registry-proxy-172.16.0.44 100m (5%) 100m (5%) 50Mi (0%) 50Mi (0%)
Allocated resources:
(Total limits may be over 100%, i.e., overcommitted. More info: http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md)
CPU Requests CPU Limits Memory Requests Memory Limits
──────────── ────────── ─────────────── ─────────────
100m (5%) 100m (5%) 50Mi (0%) 50Mi (0%)
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
59m 59m 1 {kubelet 172.16.0.44} Starting Starting kubelet.
Ssh login 44, i get the disk space is free(i also remove some docker images and containers):
[root@iZ25dqhvvd0Z ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 40G 2.6G 35G 7% /
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.7G 0 3.7G 0% /dev/shm
tmpfs 3.7G 143M 3.6G 4% /run
tmpfs 3.7G 0 3.7G 0% /sys/fs/cgroup
/dev/xvdb 40G 361M 37G 1% /k8s
Still docker logs scheduler(v1.3.0-alpha.1 ) get this
E0408 05:28:42.679448 1 factory.go:387] Error scheduling kube-system mongo-controller-0wpwu: pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.45): MatchNodeSelector
fit failure on node (172.16.0.46): MatchNodeSelector
; retrying
I0408 05:28:42.679577 1 event.go:216] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"mongo-controller-0wpwu", UID:"2d0f0844-fd3c-11e5-b531-00163e000727", APIVersion:"v1", ResourceVersion:"634139", FieldPath:""}): type: 'Warning' reason: 'FailedScheduling' pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.45): MatchNodeSelector
fit failure on node (172.16.0.46): MatchNodeSelector
We had the same issue in AWS when they changed DNS from IP=DNS name to IP=IP.eu-central: Nodes showed ready but where not reachable via their name.
Thanks for your replay Robert. i got this resolve by doing below:
kubectl delete rc
kubectl delete node 172.16.0.44
stop kubelet in 172.16.0.44
rm -rf /k8s/*
restart kubelet
Now the node is ready, and out of disk is gone.
Name: 172.16.0.44
Labels: kubernetes.io/hostname=172.16.0.44,pxc=node1
CreationTimestamp: Fri, 08 Apr 2016 15:14:51 +0800
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
──── ────── ───────────────── ────────────────── ────── ───────
Ready True Fri, 08 Apr 2016 15:25:33 +0800 Fri, 08 Apr 2016 15:14:50 +0800 KubeletReady kubelet is posting ready status
Addresses: 172.16.0.44,172.16.0.44
Capacity:
cpu: 2
memory: 7748948Ki
pods: 40
System Info:
Machine ID: 45461f76679f48ee96e95da6cc798cc8
System UUID: 2B850D4F-953C-4C20-B182-66E17D5F6461
Boot ID: 40d2cd8d-2e46-4fef-92e1-5fba60f57965
Kernel Version: 3.10.0-123.9.3.el7.x86_64
OS Image: CentOS Linux 7 (Core)
I found this https://github.com/kubernetes/kubernetes/issues/4135, but still don't know why my disk space is free and kubelet thinks it is out of disk...
The reason the scheduler failed is that there wasn't space to fit the pod onto the node. If you look at the conditions for your node, it says that the OutOfDisk condition is Unknown. The scheduler is probably not willing to place on pod onto a node that it thinks doesn't have available disk space.