worker node joins kubernetes cluster with flannel does not have cni0 created

6/14/2018

I created a 3 nodes Kubernetes cluster v1.10.4 with network plugin flannel v0.10.0 on CentOS 7.5.1804 in VMware workstation v14. It ever worked well for a few days. But today when I brought up the machines, I see the two worker nodes do not have cni0 virtual bridge created immediately after startup.

I tried delete node, re-join, reboot and all of these tries do not fix the issue. Only manually created the cni0 can temporary work around that. But another reboot erased the settings again.

Even when cni0 is missing, the kubectl -n kube-system get pods reports normal but actual inter-node communication through 10.244.0.0/12 does not work.

The cni package (v0.6.0) is installed on all nodes:

# rpm -qa | grep cni
kubernetes-cni-0.6.0-0.x86_64

Following are the commands to initialize the cluster:

# kubeadm init --apiserver-advertise-address 192.168.238.7 --kubernetes-version 1.10.4 --pod-network-cidr=10.244.0.0/16
# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml
# after worker1 re-joined cluster:
# kubectl -n kube-system get pods -o wide
NAME                             READY     STATUS    RESTARTS   AGE       IP              NODE
etcd-master                      1/1       Running   0          2h        192.168.238.7   master
kube-apiserver-master            1/1       Running   0          2h        192.168.238.7   master
kube-controller-manager-master   1/1       Running   0          2h        192.168.238.7   master
kube-dns-86f4d74b45-cc2ph        3/3       Running   0          2h        10.244.0.8      master
kube-flannel-ds-fn6fx            1/1       Running   0          2h        192.168.238.7   master
kube-flannel-ds-h9qlf            1/1       Running   0          10m       192.168.238.8   worker1
kube-proxy-vjszc                 1/1       Running   0          2h        192.168.238.7   master
kube-proxy-z2bcp                 1/1       Running   0          10m       192.168.238.8   worker1
kube-scheduler-master            1/1       Running   0          2h        192.168.238.7   master

On worker1:

$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:36:b0:02 brd ff:ff:ff:ff:ff:ff
3: ens34: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:36:b0:0c brd ff:ff:ff:ff:ff:ff
4: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:ef:0d:68 brd ff:ff:ff:ff:ff:ff
5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:ef:0d:68 brd ff:ff:ff:ff:ff:ff
6: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:f0:ec:c8:cd brd ff:ff:ff:ff:ff:ff
7: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/ether 02:ec:81:b6:ef:4d brd ff:ff:ff:ff:ff:ff

Routing table on worker1:

[root@worker1 ~]# ip ro
default via 192.168.64.2 dev ens34 
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 
169.254.0.0/16 dev ens33 scope link metric 1002 
169.254.0.0/16 dev ens34 scope link metric 1003 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 
192.168.64.0/24 dev ens34 proto kernel scope link src 192.168.64.135 
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 
192.168.238.0/24 dev ens33 proto kernel scope link src 192.168.238.8 

The docker version is: v18.03, the iptables rules is disabled for docker daemon.

# cat /etc/docker/daemon.json 
{
    "iptables": false
}

Question is how would cni0 got missing and never re-created by reboot or re-join kubernetes cluster? Are there any places that I should check?

One thing related to this is the kubernetes is deployed in VM so that I have to turned it on/off from time to time. But Kubernetes document does not have ever a procedure to handle such cluster shutdown operation except for tearing down the cluster at all. Is there any more elegant way to stop a cluster to avoid any potential damage to cluster integrity?

-- robert
flannel
kubernetes

0 Answers