Add a node to cluster with Flannel : "cannot join network of a non running container"

8/25/2019

I am adding a node to the Kubernetes cluster as a node using flannel. Here are the nodes on my cluster: kubectl get nodes

NAME              STATUS     ROLES    AGE    VERSION
jetson-80         NotReady   <none>   167m   v1.15.0
p4                Ready      master   18d    v1.15.0

This machine is reachable through the same network. When joining the cluster, Kubernetes pulls some images, among others k8s.gcr.io/pause:3.1, but for some reason failed in pulling the images:

Warning  FailedCreatePodSandBox  15d                 
kubelet,jetson-81  Failed create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.1": Error response from daemon: Get https://k8s.gcr.io/v2/: read tcp 192.168.8.81:58820->108.177.126.82:443: read: connection reset by peer

The machine is connected to the internet but only wget command works, not ping

I tried to pull images elsewhere and copy them to the machine.

REPOSITORY                                               TAG                 IMAGE ID            CREATED             SIZE
k8s.gcr.io/kube-proxy                                    v1.15.0             d235b23c3570        2 months ago        82.4MB
quay.io/coreos/flannel                                   v0.11.0-arm64       32ffa9fadfd7        6 months ago        53.5MB
k8s.gcr.io/pause                                         3.1                 da86e6ba6ca1        20 months ago       742kB

Here are the list of pods on the master :

NAME                              READY   STATUS                  RESTARTS   AGE
coredns-5c98db65d4-gmsz7          1/1     Running                 0          2d22h
coredns-5c98db65d4-j6gz5          1/1     Running                 0          2d22h
etcd-p4                           1/1     Running                 0          2d22h
kube-apiserver-p4                 1/1     Running                 0          2d22h
kube-controller-manager-p4        1/1     Running                 0          2d22h
kube-flannel-ds-amd64-cq7kz       1/1     Running                 9          17d
kube-flannel-ds-arm64-4s7kk       0/1     Init:CrashLoopBackOff   0          2m8s
kube-proxy-l2slz                  0/1     CrashLoopBackOff        4          2m8s
kube-proxy-q6db8                  1/1     Running                 0          2d22h
kube-scheduler-p4                 1/1     Running                 0          2d22h
tiller-deploy-5d6cc99fc-rwdrl     1/1     Running                 1          17d

but it didn't work either when I check the associated flannel pod kube-flannel-ds-arm64-4s7kk:

  Type     Reason          Age                            From                      Message
  ----     ------          ----                           ----                      -------
  Normal   Scheduled       66s                            default-scheduler         Successfully assigned kube-system/kube-flannel-ds-arm64-4s7kk to jetson-80
  Warning  Failed          <invalid>                      kubelet, jetson-80        Error: failed to start container "install-cni": Error response from daemon: cannot join network of a non running container: 68ffc44cf8cd655234691b0362615f97c59d285bec790af40f890510f27ba298
  Warning  Failed          <invalid>                      kubelet, jetson-80        Error: failed to start container "install-cni": Error response from daemon: cannot join network of a non running container: a196d8540b68dc7fcd97b0cda1e2f3183d1410598b6151c191b43602ac2faf8e
  Warning  Failed          <invalid>                      kubelet, jetson-80        Error: failed to start container "install-cni": Error response from daemon: cannot join network of a non running container: 9d05d1fcb54f5388ca7e64d1b6627b05d52aea270114b5a418e8911650893bc6
  Warning  Failed          <invalid>                      kubelet, jetson-80        Error: failed to start container "install-cni": Error response from daemon: cannot join network of a non running container: 5b730961cddf5cc3fb2af564b1abb46b086073d562bb2023018cd66fc5e96ce7
  Normal   Created         <invalid> (x5 over <invalid>)  kubelet, jetson-80        Created container install-cni
  Warning  Failed          <invalid>                      kubelet, jetson-80        Error: failed to start container "install-cni": Error response from daemon: cannot join network of a non running container: 1767e9eb9198969329eaa14a71a110212d6622a8b9844137ac5b247cb9e90292
  Normal   SandboxChanged  <invalid> (x5 over <invalid>)  kubelet, jetson-80        Pod sandbox changed, it will be killed and re-created.
  Warning  BackOff         <invalid> (x4 over <invalid>)  kubelet, jetson-80        Back-off restarting failed container
  Normal   Pulled          <invalid> (x6 over <invalid>)  kubelet, jetson-80        Container image "quay.io/coreos/flannel:v0.11.0-arm64" already present on machine

I still can't identify if it's a Kubernetes or Flannel issue and haven't been able to solve it despite multiple attempts. Please let me know if you need me to share more details

EDIT:

using kubectl describe pod -n kube-system kube-proxy-l2slz :

  Normal   Pulled          <invalid> (x67 over <invalid>)    kubelet, ahold-jetson-80  Container image "k8s.gcr.io/kube-proxy:v1.15.0" already present on machine
  Normal   SandboxChanged  <invalid> (x6910 over <invalid>)  kubelet, ahold-jetson-80  Pod sandbox changed, it will be killed and re-created.
  Warning  FailedSync      <invalid> (x77 over <invalid>)    kubelet, ahold-jetson-80  (combined from similar events): error determining status: rpc error: code = Unknown desc = Error: No such container: 03e7ee861f8f63261ff9289ed2d73ea5fec516068daa0f1fe2e4fd50ca42ad12
  Warning  BackOff         <invalid> (x8437 over <invalid>)  kubelet, ahold-jetson-80  Back-off restarting failed container
-- Saad Bahir
flannel
kubeadm
kubernetes

1 Answer

8/26/2019

Your problem may be coused by the mutil sandbox container in you node. Try to restart the kubelet:

$ systemctl restart kubelet

Check if you have generated and copied public key to right node to have connection between them: ssh-keygen.

Please make sure the firewall/security groups allow traffic on UDP port 58820. Look at the flannel logs and see if there are any errors there but also look for "Subnet added: " messages. Each node should have added the other two subnets.

While running ping, try to use tcpdump to see where the packets get dropped.

Try src flannel0 (icmp), src host interface (udp port 58820), dest host interface (udp port 58820), dest flannel0 (icmp), docker0 (icmp).

Here is useful documentation: flannel-documentation.

-- MaggieO
Source: StackOverflow