Requests timing out when accesing a Kubernetes clusterIP service

5/17/2018

I am looking for help to troubleshoot this basic scenario that isn't working OK:

Three nodes installed with kubeadm on VirtualBox VMs running on a MacBook:

sudo kubectl get nodes
NAME                STATUS    ROLES     AGE       VERSION
kubernetes-master   Ready     master    4h        v1.10.2
kubernetes-node1    Ready     <none>    4h        v1.10.2
kubernetes-node2    Ready     <none>    34m       v1.10.2

The Virtualbox VMs have 2 adapters: 1) Host-only 2) NAT. The node IP's from the guest computer are:

kubernetes-master (192.168.56.3)
kubernetes-node1  (192.168.56.4)
kubernetes-node2  (192.168.56.5)

I am using flannel pod network (I also tried Calico previously with the same result).

When installing the master node I used this command:

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.56.3

I deployed an nginx application whose pods are up, one pod per node:

nginx-deployment-64ff85b579-sk5zs   1/1       Running   0          14m       10.244.2.2   kubernetes-node2
nginx-deployment-64ff85b579-sqjgb   1/1       Running   0          14m       10.244.1.2   kubernetes-node1

I exposed them as a ClusterIP service:

sudo kubectl get services 
NAME               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kubernetes         ClusterIP   10.96.0.1       <none>        443/TCP   22m
nginx-deployment   ClusterIP   10.98.206.211   <none>        80/TCP    14m

Now the problem:

I ssh into kubernetes-node1 and curl the service using the cluster IP:

ssh 192.168.56.4
---
curl 10.98.206.211

Sometimes the request goes fine, returning the nginx welcome page. I can see in the logs that this requests are always answered by the pod in the same node (kubernetes-node1). Some other requests are stuck until they time out. I guess that this ones were sent to the pod in the other node (kubernetes-node2).

The same happens the other way around, when ssh'd into kubernetes-node2 the pod from this node logs the successful requests and the others time out.

I seems there is some kind of networking problem and nodes can't access pods from the other nodes. How can I fix this?

UPDATE:

I downscaled the number of replicas to 1, so now there is only one pod on kubernetes-node2

If I ssh into kubernetes-node2 all curls go fine. When in kubernetes-node1 all requests time out.

UPDATE 2:

kubernetes-master ifconfig

cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.1  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::20a0:c7ff:fe6f:8271  prefixlen 64  scopeid 0x20<link>
        ether 0a:58:0a:f4:00:01  txqueuelen 1000  (Ethernet)
        RX packets 10478  bytes 2415081 (2.4 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 11523  bytes 2630866 (2.6 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:cd:ce:84:a9  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.56.3  netmask 255.255.255.0  broadcast 192.168.56.255
        inet6 fe80::a00:27ff:fe2d:298f  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:2d:29:8f  txqueuelen 1000  (Ethernet)
        RX packets 20784  bytes 2149991 (2.1 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 26567  bytes 26397855 (26.3 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.3.15  netmask 255.255.255.0  broadcast 10.0.3.255
        inet6 fe80::a00:27ff:fe09:f08a  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:09:f0:8a  txqueuelen 1000  (Ethernet)
        RX packets 12662  bytes 12491693 (12.4 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 4507  bytes 297572 (297.5 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.0  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::c078:65ff:feb9:e4ed  prefixlen 64  scopeid 0x20<link>
        ether c2:78:65:b9:e4:ed  txqueuelen 0  (Ethernet)
        RX packets 6  bytes 444 (444.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 6  bytes 444 (444.0 B)
        TX errors 0  dropped 15 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 464615  bytes 130013389 (130.0 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 464615  bytes 130013389 (130.0 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

tunl0: flags=193<UP,RUNNING,NOARP>  mtu 1440
        tunnel   txqueuelen 1000  (IPIP Tunnel)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vethb1098eb3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::d8a3:a2ff:fedf:4d1d  prefixlen 64  scopeid 0x20<link>
        ether da:a3:a2:df:4d:1d  txqueuelen 0  (Ethernet)
        RX packets 10478  bytes 2561773 (2.5 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 11538  bytes 2631964 (2.6 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

kubernetes-node1 ifconfig

cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.1.1  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::5cab:32ff:fe04:5b89  prefixlen 64  scopeid 0x20<link>
        ether 0a:58:0a:f4:01:01  txqueuelen 1000  (Ethernet)
        RX packets 199  bytes 41004 (41.0 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 331  bytes 56438 (56.4 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:0f:02:bb:ff  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.56.4  netmask 255.255.255.0  broadcast 192.168.56.255
        inet6 fe80::a00:27ff:fe36:741a  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:36:74:1a  txqueuelen 1000  (Ethernet)
        RX packets 12834  bytes 9685221 (9.6 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 9114  bytes 1014758 (1.0 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.3.15  netmask 255.255.255.0  broadcast 10.0.3.255
        inet6 fe80::a00:27ff:feb2:23a3  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:b2:23:a3  txqueuelen 1000  (Ethernet)
        RX packets 13263  bytes 12557808 (12.5 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5065  bytes 341321 (341.3 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.1.0  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::7815:efff:fed6:1423  prefixlen 64  scopeid 0x20<link>
        ether 7a:15:ef:d6:14:23  txqueuelen 0  (Ethernet)
        RX packets 483  bytes 37506 (37.5 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 483  bytes 37506 (37.5 KB)
        TX errors 0  dropped 15 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 3072  bytes 269588 (269.5 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3072  bytes 269588 (269.5 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth153293ec: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::70b6:beff:fe94:9942  prefixlen 64  scopeid 0x20<link>
        ether 72:b6:be:94:99:42  txqueuelen 0  (Ethernet)
        RX packets 81  bytes 19066 (19.0 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 129  bytes 10066 (10.0 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

kubernetes-node2 ifconfig

cni0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 10.244.2.1  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::4428:f5ff:fe8b:a76b  prefixlen 64  scopeid 0x20<link>
        ether 0a:58:0a:f4:02:01  txqueuelen 1000  (Ethernet)
        RX packets 184  bytes 36782 (36.7 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 284  bytes 36940 (36.9 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:7f:e9:79:cd  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.56.5  netmask 255.255.255.0  broadcast 192.168.56.255
        inet6 fe80::a00:27ff:feb7:ff54  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:b7:ff:54  txqueuelen 1000  (Ethernet)
        RX packets 12634  bytes 9466460 (9.4 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8961  bytes 979807 (979.8 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.3.15  netmask 255.255.255.0  broadcast 10.0.3.255
        inet6 fe80::a00:27ff:fed8:9210  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:d8:92:10  txqueuelen 1000  (Ethernet)
        RX packets 12658  bytes 12491919 (12.4 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 4544  bytes 297215 (297.2 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.2.0  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::c832:e4ff:fe3e:f616  prefixlen 64  scopeid 0x20<link>
        ether ca:32:e4:3e:f6:16  txqueuelen 0  (Ethernet)
        RX packets 111  bytes 8466 (8.4 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 111  bytes 8466 (8.4 KB)
        TX errors 0  dropped 15 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 2940  bytes 258968 (258.9 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2940  bytes 258968 (258.9 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

UPDATE 3:

Kubelet logs:

kubernetes-master kubelet logs

kubernetes-node1 kubelet logs

kubernetes-node2 kubelet logs

IP Routes

Master

kubernetes-master:~$ ip route
default via 10.0.3.2 dev enp0s8 proto dhcp src 10.0.3.15 metric 100 
10.0.3.0/24 dev enp0s8 proto kernel scope link src 10.0.3.15 
10.0.3.2 dev enp0s8 proto dhcp scope link src 10.0.3.15 metric 100 
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1 
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink 
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
192.168.56.0/24 dev enp0s3 proto kernel scope link src 192.168.56.3 

Node1

kubernetes-node1:~$ ip route
default via 10.0.3.2 dev enp0s8 proto dhcp src 10.0.3.15 metric 100 
10.0.3.0/24 dev enp0s8 proto kernel scope link src 10.0.3.15 
10.0.3.2 dev enp0s8 proto dhcp scope link src 10.0.3.15 metric 100 
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 
10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1 
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
192.168.56.0/24 dev enp0s3 proto kernel scope link src 192.168.56.4 

Node2

kubernetes-node2:~$ ip route
default via 10.0.3.2 dev enp0s8 proto dhcp src 10.0.3.15 metric 100 
10.0.3.0/24 dev enp0s8 proto kernel scope link src 10.0.3.15 
10.0.3.2 dev enp0s8 proto dhcp scope link src 10.0.3.15 metric 100 
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
192.168.56.0/24 dev enp0s3 proto kernel scope link src 192.168.56.5

iptables-save:

kubernetes-master iptables-save

kubernetes-node1 iptables-save

kubernetes-node2 iptables-save

-- codependent
flannel
kubeadm
kubernetes
kubernetes-service
virtualbox

3 Answers

5/28/2018

Based on your logs and the fact that you had problems only with connections between nodes which use Flannel, I guess you had a problem with Flannel CNI during the installation.

In logs from node1 and master, I see the following messages:

Error adding network: open /run/flannel/subnet.env: no such file or directory
Error while adding to cni network: open /run/flannel/subnet.env: no such file or directory

The root cause can be in network problem between VMs.

I recommend you to create 2 networks for each instance in your cluster - one with NAT for access to the Internet and one Host-only for in-cluster communication.

As an alternative way - you can use Bridge mode for interfaces of VMs if your network allows it.

Finally, the only suggestion I can provide - remove all cluster components and initialize cluster one more time using the configuration I mentioned above. That is the fastest way.

-- Anton Kostenko
Source: StackOverflow

8/15/2018

I was running into a similar problem with my K8s cluster with Flannel. I had set up the vms with a NAT nic for internet connectivity and a Host-Only nic for node to node communication. Flannel was choosing the NAT nic by default for node to node communication which obviously won't work in this scenario.

I modified the flannel manifest before deploying to set the --iface=enp0s8 argument to the Host-Only nic that should have been chosen (enp0s8 in my case). In your case it looks like enp0s3 would be the correct NIC. Node to node communication worked fine after that.

I failed to note that I also modified the kube-proxy manifest to include the --cluster-cidr=10.244.0.0/16 and --proxy-mode=iptables which appears to be required as well.

-- phR0ze
Source: StackOverflow

2/24/2019

Flushed all firewalls with iptables --flush and iptables -tnat --flush then restart docker fixed it

check this github issue link

-- Rajesh Muraleedharan
Source: StackOverflow