I am looking for help to troubleshoot this basic scenario that isn't working OK:
Three nodes installed with kubeadm on VirtualBox VMs running on a MacBook:
sudo kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubernetes-master Ready master 4h v1.10.2
kubernetes-node1 Ready <none> 4h v1.10.2
kubernetes-node2 Ready <none> 34m v1.10.2
The Virtualbox VMs have 2 adapters: 1) Host-only 2) NAT. The node IP's from the guest computer are:
kubernetes-master (192.168.56.3)
kubernetes-node1 (192.168.56.4)
kubernetes-node2 (192.168.56.5)
I am using flannel pod network (I also tried Calico previously with the same result).
When installing the master node I used this command:
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.56.3
I deployed an nginx application whose pods are up, one pod per node:
nginx-deployment-64ff85b579-sk5zs 1/1 Running 0 14m 10.244.2.2 kubernetes-node2
nginx-deployment-64ff85b579-sqjgb 1/1 Running 0 14m 10.244.1.2 kubernetes-node1
I exposed them as a ClusterIP service:
sudo kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 22m
nginx-deployment ClusterIP 10.98.206.211 <none> 80/TCP 14m
Now the problem:
I ssh into kubernetes-node1 and curl the service using the cluster IP:
ssh 192.168.56.4
---
curl 10.98.206.211
Sometimes the request goes fine, returning the nginx welcome page. I can see in the logs that this requests are always answered by the pod in the same node (kubernetes-node1). Some other requests are stuck until they time out. I guess that this ones were sent to the pod in the other node (kubernetes-node2).
The same happens the other way around, when ssh'd into kubernetes-node2 the pod from this node logs the successful requests and the others time out.
I seems there is some kind of networking problem and nodes can't access pods from the other nodes. How can I fix this?
UPDATE:
I downscaled the number of replicas to 1, so now there is only one pod on kubernetes-node2
If I ssh into kubernetes-node2 all curls go fine. When in kubernetes-node1 all requests time out.
UPDATE 2:
kubernetes-master ifconfig
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.0.1 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fe80::20a0:c7ff:fe6f:8271 prefixlen 64 scopeid 0x20<link>
ether 0a:58:0a:f4:00:01 txqueuelen 1000 (Ethernet)
RX packets 10478 bytes 2415081 (2.4 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 11523 bytes 2630866 (2.6 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:cd:ce:84:a9 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.56.3 netmask 255.255.255.0 broadcast 192.168.56.255
inet6 fe80::a00:27ff:fe2d:298f prefixlen 64 scopeid 0x20<link>
ether 08:00:27:2d:29:8f txqueuelen 1000 (Ethernet)
RX packets 20784 bytes 2149991 (2.1 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 26567 bytes 26397855 (26.3 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.3.15 netmask 255.255.255.0 broadcast 10.0.3.255
inet6 fe80::a00:27ff:fe09:f08a prefixlen 64 scopeid 0x20<link>
ether 08:00:27:09:f0:8a txqueuelen 1000 (Ethernet)
RX packets 12662 bytes 12491693 (12.4 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 4507 bytes 297572 (297.5 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.0.0 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::c078:65ff:feb9:e4ed prefixlen 64 scopeid 0x20<link>
ether c2:78:65:b9:e4:ed txqueuelen 0 (Ethernet)
RX packets 6 bytes 444 (444.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 6 bytes 444 (444.0 B)
TX errors 0 dropped 15 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 464615 bytes 130013389 (130.0 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 464615 bytes 130013389 (130.0 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
tunl0: flags=193<UP,RUNNING,NOARP> mtu 1440
tunnel txqueuelen 1000 (IPIP Tunnel)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vethb1098eb3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet6 fe80::d8a3:a2ff:fedf:4d1d prefixlen 64 scopeid 0x20<link>
ether da:a3:a2:df:4d:1d txqueuelen 0 (Ethernet)
RX packets 10478 bytes 2561773 (2.5 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 11538 bytes 2631964 (2.6 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
kubernetes-node1 ifconfig
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.1.1 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fe80::5cab:32ff:fe04:5b89 prefixlen 64 scopeid 0x20<link>
ether 0a:58:0a:f4:01:01 txqueuelen 1000 (Ethernet)
RX packets 199 bytes 41004 (41.0 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 331 bytes 56438 (56.4 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:0f:02:bb:ff txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.56.4 netmask 255.255.255.0 broadcast 192.168.56.255
inet6 fe80::a00:27ff:fe36:741a prefixlen 64 scopeid 0x20<link>
ether 08:00:27:36:74:1a txqueuelen 1000 (Ethernet)
RX packets 12834 bytes 9685221 (9.6 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 9114 bytes 1014758 (1.0 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.3.15 netmask 255.255.255.0 broadcast 10.0.3.255
inet6 fe80::a00:27ff:feb2:23a3 prefixlen 64 scopeid 0x20<link>
ether 08:00:27:b2:23:a3 txqueuelen 1000 (Ethernet)
RX packets 13263 bytes 12557808 (12.5 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 5065 bytes 341321 (341.3 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.1.0 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::7815:efff:fed6:1423 prefixlen 64 scopeid 0x20<link>
ether 7a:15:ef:d6:14:23 txqueuelen 0 (Ethernet)
RX packets 483 bytes 37506 (37.5 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 483 bytes 37506 (37.5 KB)
TX errors 0 dropped 15 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 3072 bytes 269588 (269.5 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3072 bytes 269588 (269.5 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth153293ec: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet6 fe80::70b6:beff:fe94:9942 prefixlen 64 scopeid 0x20<link>
ether 72:b6:be:94:99:42 txqueuelen 0 (Ethernet)
RX packets 81 bytes 19066 (19.0 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 129 bytes 10066 (10.0 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
kubernetes-node2 ifconfig
cni0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 10.244.2.1 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fe80::4428:f5ff:fe8b:a76b prefixlen 64 scopeid 0x20<link>
ether 0a:58:0a:f4:02:01 txqueuelen 1000 (Ethernet)
RX packets 184 bytes 36782 (36.7 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 284 bytes 36940 (36.9 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:7f:e9:79:cd txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.56.5 netmask 255.255.255.0 broadcast 192.168.56.255
inet6 fe80::a00:27ff:feb7:ff54 prefixlen 64 scopeid 0x20<link>
ether 08:00:27:b7:ff:54 txqueuelen 1000 (Ethernet)
RX packets 12634 bytes 9466460 (9.4 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 8961 bytes 979807 (979.8 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.3.15 netmask 255.255.255.0 broadcast 10.0.3.255
inet6 fe80::a00:27ff:fed8:9210 prefixlen 64 scopeid 0x20<link>
ether 08:00:27:d8:92:10 txqueuelen 1000 (Ethernet)
RX packets 12658 bytes 12491919 (12.4 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 4544 bytes 297215 (297.2 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.2.0 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::c832:e4ff:fe3e:f616 prefixlen 64 scopeid 0x20<link>
ether ca:32:e4:3e:f6:16 txqueuelen 0 (Ethernet)
RX packets 111 bytes 8466 (8.4 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 111 bytes 8466 (8.4 KB)
TX errors 0 dropped 15 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 2940 bytes 258968 (258.9 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2940 bytes 258968 (258.9 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
UPDATE 3:
Kubelet logs:
kubernetes-master kubelet logs
IP Routes
Master
kubernetes-master:~$ ip route
default via 10.0.3.2 dev enp0s8 proto dhcp src 10.0.3.15 metric 100
10.0.3.0/24 dev enp0s8 proto kernel scope link src 10.0.3.15
10.0.3.2 dev enp0s8 proto dhcp scope link src 10.0.3.15 metric 100
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.56.0/24 dev enp0s3 proto kernel scope link src 192.168.56.3
Node1
kubernetes-node1:~$ ip route
default via 10.0.3.2 dev enp0s8 proto dhcp src 10.0.3.15 metric 100
10.0.3.0/24 dev enp0s8 proto kernel scope link src 10.0.3.15
10.0.3.2 dev enp0s8 proto dhcp scope link src 10.0.3.15 metric 100
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.56.0/24 dev enp0s3 proto kernel scope link src 192.168.56.4
Node2
kubernetes-node2:~$ ip route
default via 10.0.3.2 dev enp0s8 proto dhcp src 10.0.3.15 metric 100
10.0.3.0/24 dev enp0s8 proto kernel scope link src 10.0.3.15
10.0.3.2 dev enp0s8 proto dhcp scope link src 10.0.3.15 metric 100
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.56.0/24 dev enp0s3 proto kernel scope link src 192.168.56.5
iptables-save:
kubernetes-master iptables-save
Based on your logs and the fact that you had problems only with connections between nodes which use Flannel, I guess you had a problem with Flannel CNI during the installation.
In logs from node1
and master
, I see the following messages:
Error adding network: open /run/flannel/subnet.env: no such file or directory
Error while adding to cni network: open /run/flannel/subnet.env: no such file or directory
The root cause can be in network problem between VMs.
I recommend you to create 2 networks for each instance in your cluster - one with NAT for access to the Internet and one Host-only for in-cluster communication.
As an alternative way - you can use Bridge
mode for interfaces of VMs if your network allows it.
Finally, the only suggestion I can provide - remove all cluster components and initialize cluster one more time using the configuration I mentioned above. That is the fastest way.
I was running into a similar problem with my K8s cluster with Flannel. I had set up the vms with a NAT nic for internet connectivity and a Host-Only nic for node to node communication. Flannel was choosing the NAT nic by default for node to node communication which obviously won't work in this scenario.
I modified the flannel manifest before deploying to set the --iface=enp0s8 argument to the Host-Only nic that should have been chosen (enp0s8 in my case). In your case it looks like enp0s3 would be the correct NIC. Node to node communication worked fine after that.
I failed to note that I also modified the kube-proxy manifest to include the --cluster-cidr=10.244.0.0/16 and --proxy-mode=iptables which appears to be required as well.
Flushed all firewalls with iptables --flush
and iptables -tnat --flush
then restart docker fixed it