Tough One related to Kubernetes and SEtup

7/26/2019

I have a weird one that is kind of tough to figure out.

I setup a new kubernetes cluster using virtualbox and created a deployment for nginx as a test. The issue is that if I try to use curl to connect to the NodePort it assigned, it sporadically works (like 1 out of every 7 tries). Otherwise, it threw a no route to host message yesterday, and today I have not seen that, but instead it hangs for some time before displaying the nginx welcomce page.

Basics on setup:

Master and Worker1 and Worker2 have two adapters: NAT on 10.0.3.0, and Host-only adapter on 192.168.56.0

I uset /etc/netplan method on ubuntu to create static ip addresses for the Host-only adapters.

network:
        version: 2
        renderer: networkd
        ethernets:
                enp0s3:
                        dhcp4: yes
                        nameservers:
                                addresses: [10.19.1.23,10.16.147.6]
                                search: [domain1.sas.com, domain2.sas.com]

Note the missing gateway (not sure what I would put for a host-only adapter and was thinking all three servers are on same c class subnet, so they do not need a gateway. Note the default gateway is assigned to the NAT adapter.

They can ping each other just fine with no interruption on the 192.168.56.0 network. apt-get commands work like a charm. It seems to be something with how docker maybe is interacting periodically. Honestly, at this point I have no clue. I am hoping a guru here might know or have some method to determine this.

The NAT adapter has allowed me to use things like apt-get commands to install things for example for this testing.

I originally setup flannel, and then when it appeared to not be working (it was working but intermittent so I know that now), I installed weave as well. I mention this because I am not sure if they are interfering or not with each other.

The IP routing table:

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use 
Iface
default         _gateway        0.0.0.0         UG    100    0        0 
enp0s8
10.0.3.0        0.0.0.0         255.255.255.0   U     100    0        0 
enp0s8
10.32.0.0       0.0.0.0         255.240.0.0     U     0      0        0 
weave
10.244.0.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
10.244.2.0      10.244.2.0      255.255.255.0   UG    0      0        0 
flannel.1
10.244.3.0      10.244.3.0      255.255.255.0   UG    0      0        0 
flannel.1
link-local      0.0.0.0         255.255.0.0     U     1000   0        0 
enp0s8
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 
docker0

My suspicion is that it is intermittent because perhaps it traffic is going down the wrong interface or something like that.

Is it normal to have multiple flannel entries? Should there be a cni0 if I am using flannel and (uh hmm) weave?

It is not clear how to determine what is going on.

Output example of failure and success:

# curl -v 192.168.56.102:30510
* Expire in 0 ms for 6 (transfer 0x55c667b2b5c0)
*   Trying 192.168.56.102...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x55c667b2b5c0)



^C

(this one it just hangs and does nothing -- did not see this yesterday)

# curl -v 192.168.56.102:30510
* Expire in 0 ms for 6 (transfer 0x55a0675f85c0)
*   Trying 192.168.56.102...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x55a0675f85c0)
* Connected to 192.168.56.102 (192.168.56.102) port 30510 (#0)
> GET / HTTP/1.1
> Host: 192.168.56.102:30510
> User-Agent: curl/7.64.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.14.0
< Date: Fri, 26 Jul 2019 13:14:00 GMT
< Content-Type: text/html
< Content-Length: 612
< Last-Modified: Tue, 17 Apr 2018 13:46:53 GMT
< Connection: keep-alive
< ETag: "5ad5facd-264"
< Accept-Ranges: bytes
<
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
* Connection #0 to host 192.168.56.102 left intact

...so, this one worked

I just tried again and got the hang, but instead of doing ctrl-c to cancel, I let it continue, and it actually worked. Hmm, so this is a behavior I had not seen yesterday. Maybe arp cache or something cleared?

Is there something blatantly wrong with what I said I setup already that would cause this?

-- archcutbank
kubernetes
ubuntu
virtualbox

1 Answer

8/7/2019

Root cause:
Virtualbox NAT adapter doesn't forward traffic between VMs. It only provides internet access to VMs.

Host-only adapter works perfect, but Flannel CNI by default uses the adapter with default route (which is NAT). To change this behavior you can specify correct VM network interface in the Flannel DaemonSet:
(I skipped most of the YAML file content for simplicity)

kind: DaemonSet
metadata:
  name: kube-flannel-ds
spec:
  template:
    spec:
      containers:
      - name: kube-flannel
        args:
        - --ip-masq
        - --kube-subnet-mgr
        - --iface=enp0s8     # <----- This line should be added with correct for all nodes host-only iface name 

Note: You may also need to specify the [master node host-only interface IP address] as [Kubernetes API server advertised address] during cluster creation. I've also specified node IP as node name for convenience:

On the master node:

sudo kubeadm init --pod-network-cidr 10.244.0.0/16 --apiserver-advertise-address host-only.if.ip.address-of-master-node --node-name host-only.if.ip.address

On worker nodes:

sudo kubeamd join master.IP.address:6443 --token <token> --discovery-token-ca-cert-hash <hash> --node-name host-only.if.ip.address

How to fix:

  1. Remove your previous Flannel and all other network addons like Waive from the cluster. One network addon is enough.

    kubectl delete -f file-used-to-apply-network-addon-before.yml

Reboot all nodes after that to delete flannel/waive/other interfaces on the nodes.

  1. Download Flannel YAML recommended by the documentation :

    wget https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml
  2. Edit kube-flannel.yml with your favorite editor adding the line I've mentioned before:
    (I've used Ubuntu as node OS, so in my case host only adapter was enp0s8)

    - --iface=enp0s8
  3. Save file and apply it to the cluster:

    kubectl apply -f kube-flannel.yml

After couple of minutes connections between pods on different nodes should work fine.

-- VAS
Source: StackOverflow