Node leaves the kubernetes cluster randomly

4/13/2016

I created kubernetes cluster (v1.2.1) in azure environment using the tool here. I have 3 etcd nodes, 5 kube nodes (minions) and 1 kube master.

With the current configuration I'm facing issue where minion leaves the cluster randomly after few hours. Upon some debugging, docker daemon itself did not start on that node.

Error message I see when ssh'g to the bad node :

CoreOS stable (899.15.0)
Update Strategy: No Reboots
Failed Units: 5
  docker.service
  install-kubernetes.service
  install-weave.service
  locksmithd.service
  docker.socket

$ kubectl get nodes shows node status as NotReady and $kubectl get events shows weave API error 500 for pod scheduled on that node.

For some of the time restarting the node works but often times it does not. Can anyone help me debug this issue or propose some resolution or pointers ?

$ kubectl describe node kube-03

Name:           kube-03
Labels:         kubernetes.io/hostname=kube-03
CreationTimestamp:  Wed, 13 Apr 2016 02:23:02 +0530
Phase:          
Conditions:
  Type      Status  LastHeartbeatTime           LastTransitionTime          Reason      Message
  ────      ──────  ─────────────────           ──────────────────          ──────      ───────
  OutOfDisk     False   Wed, 13 Apr 2016 21:37:04 +0530     Wed, 13 Apr 2016 18:29:01 +0530     KubeletHasSufficientDisk    kubelet has sufficient disk space available
  Ready     False   Wed, 13 Apr 2016 21:37:04 +0530     Wed, 13 Apr 2016 18:29:01 +0530     KubeletNotReady         container runtime is down
Addresses:  172.18.0.20,172.18.0.20
Capacity:
 cpu:       4
 memory:    28815788Ki
 pods:      110
System Info:
 Machine ID:            8ab8c56a9b72435981be3ca65285a00e
 System UUID:           DBAD108F-9CEC-5548-BB66-22618928D4DA
 Boot ID:           cf27687a-0149-4c40-8f42-db7c4268e6b1
 Kernel Version:        4.3.6-coreos
 OS Image:          CoreOS 899.15.0
 Container Runtime Version: docker://Unknown
 Kubelet Version:       v1.2.1
 Kube-Proxy Version:        v1.2.1
ExternalID:         kube-03
Non-terminated Pods:        (0 in total)
  Namespace         Name        CPU Requests    CPU Limits  Memory Requests Memory Limits
  ─────────         ────        ────────────    ──────────  ─────────────── ─────────────
Allocated resources:
  (Total limits may be over 100%, i.e., overcommitted. More info: http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md)
  CPU Requests  CPU Limits  Memory Requests Memory Limits
  ────────────  ──────────  ─────────────── ─────────────
  0 (0%)    0 (0%)      0 (0%)      0 (0%)
No events.
-- Phagun Baya
azure
kubernetes

0 Answers