I created a 3 nodes Kubernetes cluster v1.10.4
with network plugin flannel v0.10.0
on CentOS 7.5.1804
in VMware workstation v14. It ever worked well for a few days. But today when I brought up the machines, I see the two worker nodes do not have cni0
virtual bridge created immediately after startup.
I tried delete node, re-join, reboot and all of these tries do not fix the issue. Only manually created the cni0
can temporary work around that. But another reboot erased the settings again.
Even when cni0
is missing, the kubectl -n kube-system get pods
reports normal but actual inter-node communication through 10.244.0.0/12
does not work.
The cni package (v0.6.0
) is installed on all nodes:
# rpm -qa | grep cni
kubernetes-cni-0.6.0-0.x86_64
Following are the commands to initialize the cluster:
# kubeadm init --apiserver-advertise-address 192.168.238.7 --kubernetes-version 1.10.4 --pod-network-cidr=10.244.0.0/16
# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml
# after worker1 re-joined cluster:
# kubectl -n kube-system get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
etcd-master 1/1 Running 0 2h 192.168.238.7 master
kube-apiserver-master 1/1 Running 0 2h 192.168.238.7 master
kube-controller-manager-master 1/1 Running 0 2h 192.168.238.7 master
kube-dns-86f4d74b45-cc2ph 3/3 Running 0 2h 10.244.0.8 master
kube-flannel-ds-fn6fx 1/1 Running 0 2h 192.168.238.7 master
kube-flannel-ds-h9qlf 1/1 Running 0 10m 192.168.238.8 worker1
kube-proxy-vjszc 1/1 Running 0 2h 192.168.238.7 master
kube-proxy-z2bcp 1/1 Running 0 10m 192.168.238.8 worker1
kube-scheduler-master 1/1 Running 0 2h 192.168.238.7 master
On worker1:
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:36:b0:02 brd ff:ff:ff:ff:ff:ff
3: ens34: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:36:b0:0c brd ff:ff:ff:ff:ff:ff
4: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
link/ether 52:54:00:ef:0d:68 brd ff:ff:ff:ff:ff:ff
5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN mode DEFAULT group default qlen 1000
link/ether 52:54:00:ef:0d:68 brd ff:ff:ff:ff:ff:ff
6: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:f0:ec:c8:cd brd ff:ff:ff:ff:ff:ff
7: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default
link/ether 02:ec:81:b6:ef:4d brd ff:ff:ff:ff:ff:ff
Routing table on worker1:
[root@worker1 ~]# ip ro
default via 192.168.64.2 dev ens34
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
169.254.0.0/16 dev ens33 scope link metric 1002
169.254.0.0/16 dev ens34 scope link metric 1003
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.64.0/24 dev ens34 proto kernel scope link src 192.168.64.135
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1
192.168.238.0/24 dev ens33 proto kernel scope link src 192.168.238.8
The docker version is: v18.03
, the iptables rules is disabled for docker daemon.
# cat /etc/docker/daemon.json
{
"iptables": false
}
Question is how would cni0
got missing and never re-created by reboot or re-join kubernetes cluster? Are there any places that I should check?
One thing related to this is the kubernetes is deployed in VM so that I have to turned it on/off from time to time. But Kubernetes document does not have ever a procedure to handle such cluster shutdown operation except for tearing down the cluster at all. Is there any more elegant way to stop a cluster to avoid any potential damage to cluster integrity?