Fresh Install Kubernetes worker nodes never become "Ready"

4/14/2017

I've been battling a kubernetes install problem. We started up a new openstack environment and the scripts that work in the old fail environment fail in the new one.

We are using K8s v1.5.4 using these scripts: https://github.com/coreos/coreos-kubernetes/tree/master/multi-node/generic

CoreOS 1298.7.0

The master seems to come up fine. I can deploy pods to it, always shows ready when running kubectl get nodes

The worker installation script runs, however it never shows a ready state.

kubectl get nodes --show-labels
NAME             STATUS                     AGE       LABELS
MYIP.118.240.122   Ready,SchedulingDisabled   7m        beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=MYIP.118.240.122
MYIP.118.240.129   NotReady                   5m        beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=MYIP.118.240.129

If I run kubectl describe node MYIP.118.240.129 I get the following:

(testtest)➜  dev kubectl describe node MYIP.118.240.129
Name:           MYIP.118.240.129
Role:
Labels:         beta.kubernetes.io/arch=amd64
            beta.kubernetes.io/os=linux
            kubernetes.io/hostname=MYIP.118.240.129
Taints:         <none>
CreationTimestamp:  Fri, 14 Apr 2017 15:27:47 -0600
Phase:
Conditions:
  Type          Status      LastHeartbeatTime           LastTransitionTime          Reason              Message
  ----          ------      -----------------           ------------------          ------              -------
  OutOfDisk         Unknown     Fri, 14 Apr 2017 15:27:47 -0600     Fri, 14 Apr 2017 15:28:29 -0600     NodeStatusUnknown       Kubelet stopped posting node status.
  MemoryPressure    False       Fri, 14 Apr 2017 15:27:47 -0600     Fri, 14 Apr 2017 15:27:47 -0600     KubeletHasSufficientMemory  kubelet has sufficient memory available
  DiskPressure      False       Fri, 14 Apr 2017 15:27:47 -0600     Fri, 14 Apr 2017 15:27:47 -0600     KubeletHasNoDiskPressure    kubelet has no disk pressure
  Ready         Unknown     Fri, 14 Apr 2017 15:27:47 -0600     Fri, 14 Apr 2017 15:28:29 -0600     NodeStatusUnknown       Kubelet stopped posting node status.
Addresses:      MYIP.118.240.129,MYIP.118.240.129,MYIP.118.240.129
Capacity:
 alpha.kubernetes.io/nvidia-gpu:    0
 cpu:                   1
 memory:                2052924Ki
 pods:                  110
Allocatable:
 alpha.kubernetes.io/nvidia-gpu:    0
 cpu:                   1
 memory:                2052924Ki
 pods:                  110
System Info:
 Machine ID:            efee03ac51c641888MYIP50dfa2a40350d
 System UUID:           4467C959-37FE-48ED-A263-C36DD0D445F1
 Boot ID:           50eb5e93-5aed-441b-b3ef-36da1472e4ea
 Kernel Version:        4.9.16-coreos-r1
 OS Image:          Container Linux by CoreOS 1298.7.0 (Ladybug)
 Operating System:      linux
 Architecture:          amd64
 Container Runtime Version: docker://1.12.6
 Kubelet Version:       v1.5.4+coreos.0
 Kube-Proxy Version:        v1.5.4+coreos.0
ExternalID:         MYIP.118.240.129
Non-terminated Pods:        (5 in total)
  Namespace         Name                        CPU Requests    CPU Limits  Memory Requests Memory Limits
  ---------         ----                        ------------    ----------  --------------- -------------
  kube-system           heapster-v1.2.0-216693398-sfz1m         50m (5%)    50m (5%)    90Mi (4%)   90Mi (4%)
  kube-system           kube-dns-782804071-psmfc            260m (26%)  0 (0%)      140Mi (6%)  220Mi (10%)
  kube-system           kube-dns-autoscaler-2715466192-jmb3h        20m (2%)    0 (0%)      10Mi (0%)   0 (0%)
  kube-system           kube-proxy-MYIP.118.240.129         0 (0%)      0 (0%)      0 (0%)      0 (0%)
  kube-system           kubernetes-dashboard-3543765157-w8zv2       100m (10%)  100m (10%)  50Mi (2%)   50Mi (2%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.
  CPU Requests  CPU Limits  Memory Requests Memory Limits
  ------------  ----------  --------------- -------------
  430m (43%)    150m (15%)  290Mi (14%) 360Mi (17%)
Events:
  FirstSeen LastSeen    Count   From                SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----                -------------   --------    ------          -------
  11m       11m     1   {kubelet MYIP.118.240.129}          Normal      Starting        Starting kubelet.
  11m       11m     1   {kubelet MYIP.118.240.129}          Warning     ImageGCFailed       unable to find data for container /
  11m       11m     2   {kubelet MYIP.118.240.129}          Normal      NodeHasSufficientDisk   Node MYIP.118.240.129 status is now: NodeHasSufficientDisk
  11m       11m     2   {kubelet MYIP.118.240.129}          Normal      NodeHasSufficientMemory Node MYIP.118.240.129 status is now: NodeHasSufficientMemory
  11m       11m     2   {kubelet MYIP.118.240.129}          Normal      NodeHasNoDiskPressure   Node MYIP.118.240.129 status is now: NodeHasNoDiskPressure
(testtest)➜  dev

All ports are open within this internal network between the worker and master.

If I run docker ps on the worker I get:

ID        IMAGE                                      COMMAND                  CREATED             STATUS              PORTS               NAMES
c25cf12b43f3        quay.io/coreos/hyperkube:v1.5.4_coreos.0   "/hyperkube proxy --m"   4 minutes ago       Up 4 minutes                            k8s_kube-proxy.96aded63_kube-proxy-MYIP.118.240.129_kube-system_23185d6abc4d5c8f11da2ca1943fd398_5ba9628a
c4d14dfd7d52        gcr.io/google_containers/pause-amd64:3.0   "/pause"                 6 minutes ago       Up 6 minutes                            k8s_POD.d8dbe16c_kube-proxy-MYIP.118.240.129_kube-

system_23185d6abc4d5c8f11da2ca1943fd398_e8a1c6d6

kubelet logs after running all weekend:

    Apr 17 20:53:15 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:53:15.507939    1353 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Apr 17 20:48:15 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:48:15.484016    1353 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Apr 17 20:43:15 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:15.405888    1353 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: W0417 20:43:07.361035    1353 kubelet.go:1497] Deleting mirror pod "kube-proxy-MYIP.118.240.129_kube-system(37537fb7-2159-11e7-b692-fa163e952b1c)" because it is outdated
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.018406    1353 event.go:208] Unable to write event: 'Post https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/events: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer' (may retry after sleeping)
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.017813    1353 reflector.go:188] pkg/kubelet/kubelet.go:386: Failed to list *api.Node: Get https://MYIP.118.240.122:443/api/v1/nodes?fieldSelector=metadata.name%3DMYIP.118.240.129&resourceVersion=0: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.017711    1353 reflector.go:188] pkg/kubelet/kubelet.go:378: Failed to list *api.Service: Get https://MYIP.118.240.122:443/api/v1/services?resourceVersion=0: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.016457    1353 kubelet_node_status.go:302] Error updating node status, will retry: error getting node "MYIP.118.240.129": Get https://MYIP.118.240.122:443/api/v1/nodes?fieldSelector=metadata.name%3DMYIP.118.240.129: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.0161MYIP    1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/e8ea63b2-2159-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"e8ea63b2-2159-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.016165356 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/e8ea63b2-2159-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "e8ea63b2-2159-11e7-b692-fa163e952b1c" (UID: "e8ea63b2-2159-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.016058    1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015943    1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/ec05331e-2158-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"ec05331e-2158-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.015913703 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/ec05331e-2158-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "ec05331e-2158-11e7-b692-fa163e952b1c" (UID: "ec05331e-2158-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015843    1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015732    1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/e8fdcca4-2159-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"e8fdcca4-2159-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.015656131 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/e8fdcca4-2159-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "e8fdcca4-2159-11e7-b692-fa163e952b1c" (UID: "e8fdcca4-2159-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015559    1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015429    1353 reflector.go:188] pkg/kubelet/config/apiserver.go:44: Failed to list *api.Pod: Get https://MYIP.118.240.122:443/api/v1/pods?fieldSelector=spec.nodeName%3DMYIP.118.240.129&resourceVersion=0: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.012918    1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/ec091be8-2158-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"ec091be8-2158-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.012889039 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/ec091be8-2158-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "ec091be8-2158-11e7-b692-fa163e952b1c" (UID: "ec091be8-2158-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.012820    1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.012661    1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/ec09da25-2158-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"ec09da25-2158-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.012630687 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/ec09da25-2158-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "ec09da25-2158-11e7-b692-fa163e952b1c" (UID: "ec09da25-2158-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer

If you notice in the logs the worker nodes it having trouble talking to the master node....

However if I ssh into the worker and run a command like:

core@philtest ~ $ curl https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7 --insecure
Unauthorized

It's TLS so I didn't expect it to auth of course.

Any suggestions on how to debug this?

Thanks!

-- phil swenson
coreos
kubernetes

2 Answers

6/12/2017

It turned out the problem was an inconsistent network setting for MTU in openstack. Packets > 1500 bytes or so were being dropped.

-- phil swenson
Source: StackOverflow

4/21/2017

You need to check if you add your IP address in the SSL generation file (openssl.cnf) for the master. Try to recreate your certificate with the IP of your dns server too ( if you follow coreOS it's 10.3.0.1 ). Your openssl.cnf will look like this:

 [req]
 req_extensions = v3_req
 distinguished_name = req_distinguished_name
 [req_distinguished_name]
 [ v3_req ]
 basicConstraints = CA:FALSE
 keyUsage = nonRepudiation, digitalSignature, keyEncipherment
 subjectAltName = @alt_names
 [alt_names]
 DNS.1 = kubernetes
 DNS.2 = kubernetes.default
 DNS.3 = kubernetes.default.svc
 DNS.4 = kubernetes.default.svc.cluster.local
 IP.1 = 10.3.0.1
 IP.2 = PRIVATE_MASTER_IP
 IP.3 = PUBLIC_MASTER_IP

You will also need to recreate certificate for node(s). After that delete secret from namespaces to automatically regenerate it. sources CoreOS docs

-- Julien Du Bois
Source: StackOverflow