Kubernetes services sometimes unreachable

2/10/2017

I have a cluster kubernetes v1.5.2 installed with kops and using the weave network plugin. I noticed that sometimes my kubernetes services get unreachable from within a pod on my cluster.

I went through the entire article about troubleshooting services: https://kubernetes.io/docs/admin/cluster-troubleshooting/ I can confirm that everything is performing as expected but sometimes it's not (this is a curl from within a pod of the cluster trying to reach a service using its IP address. This service is backed by 5 endpoints, all up and running):

gt; curl 100.65.135.200 -vv
* Rebuilt URL to: 100.65.135.200/ * Trying 100.65.135.200... * connect to 100.65.135.200 port 80 failed: No route to host * Failed to connect to 100.65.135.200 port 80: No route to host * Closing connection 0 curl: (7) Failed to connect to 100.65.135.200 port 80: No route to host

That's the first time I setup a cluster with kops and weave and that's the first time I see this. If anyone has a clue to debug this, that would be awesome !!

Update

  • kube proxy is correctly registering my service: I0210 23:09:41.070508 6 proxier.go:472] Adding new service "my_app/my_app:http" at 100.65.135.200:80/TCP

  • My pod IPs doesn't overlap with the cluster's

I'm seeing some weird logs however on the weave-kube container on the 2 nodes of my clusters:

INFO: 2017/02/11 12:14:10.959122 Discovered remote MAC b2:3e:c7:99:16:de at ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:14:10.959348 Captured frame from MAC (b2:3e:c7:99:16:de) to (ff:ff:ff:ff:ff:ff) associated with another peer ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:14:39.140186 Captured frame from MAC (06:b7:eb:e7:fa:0e) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:15:52.273667 Captured frame from MAC (32:f9:43:24:68:ad) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.686643 Captured frame from MAC (c2:58:a0:4e:b2:ff) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.686969 Captured frame from MAC (ce:7d:9f:95:66:fb) to (ff:ff:ff:ff:ff:ff) associated with another peer ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:16:56.687002 Captured frame from MAC (72:85:2b:19:65:b9) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.687042 Captured frame from MAC (f2:1a:9e:d8:7f:a3) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)

Gonna investigate this

Update 2

So these weaves errors was my problem. Apparently ethtool was required by weave and it was missing from my image. I updated the AMI to 1.5 and everything is now working as expected.

-- rmonjo
kops
kubernetes
service

1 Answer

2/11/2017

everything is performing as expected but sometimes it's not

It would be good to get some more detail to characterise this - is it one pod fails while others work, or all pods sometimes work and sometimes fail?

However, some additional things to check:

  1. Are your virtual ethernet devices being disconnected from the bridge? See https://github.com/weaveworks/weave/issues/2601
  2. Does your pod IP address space overlap the cluster IP address space?
  3. Check 100.65.135.200 is mapped by kube-proxy (that part is described in https://kubernetes.io/docs/admin/cluster-troubleshooting/)

The ultimate step is to look at the network packets - run tcpdump -n -i weave while you run the curl test; if you don't see anything there then run the dump on the pod's veth.

-- Bryan
Source: StackOverflow