I have a cluster kubernetes v1.5.2 installed with kops
and using the weave
network plugin. I noticed that sometimes my kubernetes services get unreachable from within a pod on my cluster.
I went through the entire article about troubleshooting services: https://kubernetes.io/docs/admin/cluster-troubleshooting/ I can confirm that everything is performing as expected but sometimes it's not (this is a curl from within a pod of the cluster trying to reach a service using its IP address. This service is backed by 5 endpoints, all up and running):
gt; curl 100.65.135.200 -vv
* Rebuilt URL to: 100.65.135.200/
* Trying 100.65.135.200...
* connect to 100.65.135.200 port 80 failed: No route to host
* Failed to connect to 100.65.135.200 port 80: No route to host
* Closing connection 0
curl: (7) Failed to connect to 100.65.135.200 port 80: No route to host
That's the first time I setup a cluster with kops
and weave
and that's the first time I see this. If anyone has a clue to debug this, that would be awesome !!
kube proxy is correctly registering my service: I0210 23:09:41.070508 6 proxier.go:472] Adding new service "my_app/my_app:http" at 100.65.135.200:80/TCP
My pod IPs doesn't overlap with the cluster's
I'm seeing some weird logs however on the weave-kube
container on the 2 nodes of my clusters:
INFO: 2017/02/11 12:14:10.959122 Discovered remote MAC b2:3e:c7:99:16:de at ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:14:10.959348 Captured frame from MAC (b2:3e:c7:99:16:de) to (ff:ff:ff:ff:ff:ff) associated with another peer ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:14:39.140186 Captured frame from MAC (06:b7:eb:e7:fa:0e) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:15:52.273667 Captured frame from MAC (32:f9:43:24:68:ad) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.686643 Captured frame from MAC (c2:58:a0:4e:b2:ff) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.686969 Captured frame from MAC (ce:7d:9f:95:66:fb) to (ff:ff:ff:ff:ff:ff) associated with another peer ce:7d:9f:95:66:fb(ip-172-20-55-245)
ERRO: 2017/02/11 12:16:56.687002 Captured frame from MAC (72:85:2b:19:65:b9) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
ERRO: 2017/02/11 12:16:56.687042 Captured frame from MAC (f2:1a:9e:d8:7f:a3) to (ff:ff:ff:ff:ff:ff) associated with another peer c2:58:a0:4e:b2:ff(ip-172-20-75-108)
Gonna investigate this
So these weaves errors was my problem. Apparently ethtool was required by weave and it was missing from my image. I updated the AMI to 1.5 and everything is now working as expected.
everything is performing as expected but sometimes it's not
It would be good to get some more detail to characterise this - is it one pod fails while others work, or all pods sometimes work and sometimes fail?
However, some additional things to check:
The ultimate step is to look at the network packets - run tcpdump -n -i weave
while you run the curl
test; if you don't see anything there then run the dump on the pod's veth.