kubernetes node not responding after restart

2/22/2017

I have a kubernetes cluster with one master and four nodes. kube-proxy was working fine on all four nodes, and I could access services on any of the nodes irrespective of where it was running; ie. http://node1:30000 through http://node4:30000 was giving the same response.

After restarting node4 by running shutdown -r now, it came back up, but I noticed that the node was no longer responding to requests. I am running the following command:

curl http://node4:30000

If I run it from my PC, or from any other node in the cluster -- node1 through node3, or master -- I get:

curl: (7) Failed to connect to node4 port 30000: Connection timed out

However, if I run it from node4, it responds successfully. This leads me to believe that kube-proxy is running fine, but something is preventing external connections.

When I run kubectl describe node node4, my output looks normal:

Name:                   node4
Labels:                 beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/os=linux
                        kubernetes.io/hostname=node4
Taints:                 <none>
CreationTimestamp:      Tue, 21 Feb 2017 15:21:17 -0400
Phase:
Conditions:
  Type                  Status  LastHeartbeatTime                       LastTransitionTime                      Reason                          Message
  ----                  ------  -----------------                       ------------------                      ------                          -------
  OutOfDisk             False   Wed, 22 Feb 2017 08:03:40 -0400         Tue, 21 Feb 2017 15:21:18 -0400         KubeletHasSufficientDisk        kubelet has sufficient disk space available
  MemoryPressure        False   Wed, 22 Feb 2017 08:03:40 -0400         Tue, 21 Feb 2017 15:21:18 -0400         KubeletHasSufficientMemory      kubelet has sufficient memory available
  DiskPressure          False   Wed, 22 Feb 2017 08:03:40 -0400         Tue, 21 Feb 2017 15:21:18 -0400         KubeletHasNoDiskPressure        kubelet has no disk pressure
  Ready                 True    Wed, 22 Feb 2017 08:03:40 -0400         Tue, 21 Feb 2017 15:21:28 -0400         KubeletReady                    kubelet is posting ready status. AppArmor enabled
Addresses:              10.6.81.64,10.6.81.64,node4
Capacity:
 alpha.kubernetes.io/nvidia-gpu:        0
 cpu:                                   2
 memory:                                4028748Ki
 pods:                                  110
Allocatable:
 alpha.kubernetes.io/nvidia-gpu:        0
 cpu:                                   2
 memory:                                4028748Ki
 pods:                                  110
System Info:
 Machine ID:                    dbc0bb6ba10acae66b1061f958220ade
 System UUID:                   4229186F-AA5C-59CE-E5A2-258C1BBE9D2C
 Boot ID:                       a3968e6c-eba3-498c-957f-f29283af1cff
 Kernel Version:                4.4.0-63-generic
 OS Image:                      Ubuntu 16.04.1 LTS
 Operating System:              linux
 Architecture:                  amd64
 Container Runtime Version:     docker://1.13.0
 Kubelet Version:               v1.5.2
 Kube-Proxy Version:            v1.5.2
ExternalID:                     node4
Non-terminated Pods:            (27 in total)
  Namespace                     Name                                                                    CPU Requests    CPU Limits      Memory Requests Memory Limits
  ---------                     ----                                                                    ------------    ----------      --------------- -------------
  << application pods listed here >>
  kube-system                   kube-proxy-0p3lj                                                        0 (0%)          0 (0%)          0 (0%)          0 (0%)
  kube-system                   weave-net-uqmr1                                                         20m (1%)        0 (0%)          0 (0%)          0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.
  CPU Requests  CPU Limits      Memory Requests Memory Limits
  ------------  ----------      --------------- -------------
  20m (1%)      0 (0%)          0 (0%)          0 (0%)

Is there anything specific I need to do to bring a node back online after a system restart?

-- Mike Clemens
kubernetes

1 Answer

2/23/2017

My team was able to solve this one by downgrading docker to 1.12. It appears that the problem is related to this issue:

https://github.com/kubernetes/kubernetes/issues/40182

After downgrading docker to 1.12, everything is working now.

-- Mike Clemens
Source: StackOverflow