I have a weird one that is kind of tough to figure out.
I setup a new kubernetes cluster using virtualbox and created a deployment for nginx as a test. The issue is that if I try to use curl to connect to the NodePort it assigned, it sporadically works (like 1 out of every 7 tries). Otherwise, it threw a no route to host message yesterday, and today I have not seen that, but instead it hangs for some time before displaying the nginx welcomce page.
Basics on setup:
Master and Worker1 and Worker2 have two adapters: NAT on 10.0.3.0, and Host-only adapter on 192.168.56.0
I uset /etc/netplan method on ubuntu to create static ip addresses for the Host-only adapters.
network:
version: 2
renderer: networkd
ethernets:
enp0s3:
dhcp4: yes
nameservers:
addresses: [10.19.1.23,10.16.147.6]
search: [domain1.sas.com, domain2.sas.com]
Note the missing gateway (not sure what I would put for a host-only adapter and was thinking all three servers are on same c class subnet, so they do not need a gateway. Note the default gateway is assigned to the NAT adapter.
They can ping each other just fine with no interruption on the 192.168.56.0 network. apt-get commands work like a charm. It seems to be something with how docker maybe is interacting periodically. Honestly, at this point I have no clue. I am hoping a guru here might know or have some method to determine this.
The NAT adapter has allowed me to use things like apt-get commands to install things for example for this testing.
I originally setup flannel, and then when it appeared to not be working (it was working but intermittent so I know that now), I installed weave as well. I mention this because I am not sure if they are interfering or not with each other.
The IP routing table:
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use
Iface
default _gateway 0.0.0.0 UG 100 0 0
enp0s8
10.0.3.0 0.0.0.0 255.255.255.0 U 100 0 0
enp0s8
10.32.0.0 0.0.0.0 255.240.0.0 U 0 0 0
weave
10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
10.244.2.0 10.244.2.0 255.255.255.0 UG 0 0 0
flannel.1
10.244.3.0 10.244.3.0 255.255.255.0 UG 0 0 0
flannel.1
link-local 0.0.0.0 255.255.0.0 U 1000 0 0
enp0s8
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0
docker0
My suspicion is that it is intermittent because perhaps it traffic is going down the wrong interface or something like that.
Is it normal to have multiple flannel entries? Should there be a cni0 if I am using flannel and (uh hmm) weave?
It is not clear how to determine what is going on.
Output example of failure and success:
# curl -v 192.168.56.102:30510
* Expire in 0 ms for 6 (transfer 0x55c667b2b5c0)
* Trying 192.168.56.102...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x55c667b2b5c0)
^C
(this one it just hangs and does nothing -- did not see this yesterday)
# curl -v 192.168.56.102:30510
* Expire in 0 ms for 6 (transfer 0x55a0675f85c0)
* Trying 192.168.56.102...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x55a0675f85c0)
* Connected to 192.168.56.102 (192.168.56.102) port 30510 (#0)
> GET / HTTP/1.1
> Host: 192.168.56.102:30510
> User-Agent: curl/7.64.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.14.0
< Date: Fri, 26 Jul 2019 13:14:00 GMT
< Content-Type: text/html
< Content-Length: 612
< Last-Modified: Tue, 17 Apr 2018 13:46:53 GMT
< Connection: keep-alive
< ETag: "5ad5facd-264"
< Accept-Ranges: bytes
<
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
* Connection #0 to host 192.168.56.102 left intact
...so, this one worked
I just tried again and got the hang, but instead of doing ctrl-c to cancel, I let it continue, and it actually worked. Hmm, so this is a behavior I had not seen yesterday. Maybe arp cache or something cleared?
Is there something blatantly wrong with what I said I setup already that would cause this?
Root cause:
Virtualbox NAT
adapter doesn't forward traffic between VMs. It only provides internet access to VMs.
Host-only adapter works perfect, but Flannel CNI by default uses the adapter with default route (which is NAT
). To change this behavior you can specify correct VM network interface in the Flannel DaemonSet:
(I skipped most of the YAML file content for simplicity)
kind: DaemonSet
metadata:
name: kube-flannel-ds
spec:
template:
spec:
containers:
- name: kube-flannel
args:
- --ip-masq
- --kube-subnet-mgr
- --iface=enp0s8 # <----- This line should be added with correct for all nodes host-only iface name
Note: You may also need to specify the [master node host-only interface IP address] as [Kubernetes API server advertised address] during cluster creation. I've also specified node IP
as node name for convenience:
On the master node:
sudo kubeadm init --pod-network-cidr 10.244.0.0/16 --apiserver-advertise-address host-only.if.ip.address-of-master-node --node-name host-only.if.ip.address
On worker nodes:
sudo kubeamd join master.IP.address:6443 --token <token> --discovery-token-ca-cert-hash <hash> --node-name host-only.if.ip.address
How to fix:
Remove your previous Flannel and all other network addons like Waive from the cluster. One network addon is enough.
kubectl delete -f file-used-to-apply-network-addon-before.yml
Reboot all nodes after that to delete flannel/waive/other
interfaces on the nodes.
Download Flannel YAML recommended by the documentation :
wget https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml
Edit kube-flannel.yml
with your favorite editor adding the line I've mentioned before:
(I've used Ubuntu as node OS, so in my case host only adapter was enp0s8)
- --iface=enp0s8
Save file and apply it to the cluster:
kubectl apply -f kube-flannel.yml
After couple of minutes connections between pods on different nodes should work fine.