I have kubernetes
cluster running on 4 Raspberry-pi
devices, out of which 1 is acting as master
and other 3 are working as worker
i.e w1
, w2
, w3
. I have started a daemon set deployment, so each worker is running a pod of 2 containers.
w2
is running pod of 2 container. If I exec
into any container and ping www.google.com
from the container, I get the response. But if I do the same on w1
and w3
it says temporary failure in name resolution
. All the pods in kube-system are running. I am using weave
for networking. Below are all the pods for kube-system
NAME READY STATUS RESTARTS AGE
etcd-master-pi 1/1 Running 1 23h
kube-apiserver-master-pi 1/1 Running 1 23h
kube-controller-manager-master-pi 1/1 Running 1 23h
kube-dns-7b6ff86f69-97vtl 3/3 Running 3 23h
kube-proxy-2tmgw 1/1 Running 0 14m
kube-proxy-9xfx9 1/1 Running 2 22h
kube-proxy-nfgwg 1/1 Running 1 23h
kube-proxy-xbdxl 1/1 Running 3 23h
kube-scheduler-master-pi 1/1 Running 1 23h
weave-net-7sh5n 2/2 Running 1 14m
weave-net-c7x8p 2/2 Running 3 23h
weave-net-mz4c4 2/2 Running 6 22h
weave-net-qtgmw 2/2 Running 10 23h
If I am starting the containers using the normal docker container command but not from the kubernetes deployment then I do not see this issue. I think this is because of kube-dns
. How can I debug this issue.?
This might not be applicable to your scenario, but I wanted to document the solution I found. My issues ended up being related to a flannel network overlay setup on our master nodes.
# kubectl get pods --namespace kube-system
NAME READY STATUS RESTARTS AGE
coredns-qwer 1/1 Running 0 4h54m
coredns-asdf 1/1 Running 0 4h54m
etcd-h1 1/1 Running 0 4h53m
etcd-h2 1/1 Running 0 4h48m
etcd-h3 1/1 Running 0 4h48m
kube-apiserver-h1 1/1 Running 0 4h53m
kube-apiserver-h2 1/1 Running 0 4h48m
kube-apiserver-h3 1/1 Running 0 4h48m
kube-controller-manager-h1 1/1 Running 2 4h53m
kube-controller-manager-h2 1/1 Running 0 4h48m
kube-controller-manager-h3 1/1 Running 0 4h48m
kube-flannel-ds-amd64-asdf 1/1 Running 0 4h48m
kube-flannel-ds-amd64-qwer 1/1 Running 1 4h48m
kube-flannel-ds-amd64-zxcv 1/1 Running 0 3h51m
kube-flannel-ds-amd64-wert 1/1 Running 0 4h54m
kube-flannel-ds-amd64-sdfg 1/1 Running 1 4h41m
kube-flannel-ds-amd64-xcvb 1/1 Running 1 4h42m
kube-proxy-qwer 1/1 Running 0 4h42m
kube-proxy-asdf 1/1 Running 0 4h54m
kube-proxy-zxcv 1/1 Running 0 4h48m
kube-proxy-wert 1/1 Running 0 4h41m
kube-proxy-sdfg 1/1 Running 0 4h48m
kube-proxy-xcvb 1/1 Running 0 4h42m
kube-scheduler-h1 1/1 Running 1 4h53m
kube-scheduler-h2 1/1 Running 1 4h48m
kube-scheduler-h3 1/1 Running 0 4h48m
tiller-deploy-asdf 1/1 Running 0 4h28m
If I exec'd into any container and ping'd google.com from the container, I get a bad address response.
# ping google.com
ping: bad address 'google.com'
# ip route
default via 10.168.3.1 dev eth0
10.168.3.0/24 dev eth0 scope link src 10.168.3.22
10.244.0.0/16 via 10.168.3.1 dev eth0
ip route varies from
ip route
run from the master node.
altering my pods deployment configuration to include the hostNetwork: true
allowed me to ping outside my container.
my newly running pod ip route
# ip route
default via 172.25.10.1 dev ens192 metric 100
10.168.0.0/24 via 10.168.0.0 dev flannel.1 onlink
10.168.1.0/24 via 10.168.1.0 dev flannel.1 onlink
10.168.2.0/24 via 10.168.2.0 dev flannel.1 onlink
10.168.3.0/24 dev cni0 scope link src 10.168.3.1
10.168.4.0/24 via 10.168.4.0 dev flannel.1 onlink
10.168.5.0/24 via 10.168.5.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 scope link src 172.17.0.1
172.25.10.0/23 dev ens192 scope link src 172.25.11.35 metric 100
192.168.122.0/24 dev virbr0 scope link src 192.168.122.1
# ping google.com
PING google.com (172.217.6.110): 56 data bytes
64 bytes from 172.217.6.110: seq=0 ttl=55 time=3.488 ms
My associate and I found a number of different websites which advise against setting hostNetwork: true
. We then found this issue and are currently investigating it as a possible solution, sans hostNetwork: true
.
Usually you'd do this with the '--ip-masq' flag to flannel which is 'false' by default and is defined as "setup IP masquerade rule for traffic destined outside of overlay network". Which sounds like what you want.
It turns out that our flannel network overlay was misconfigured. We needed to ensure that our configmap for flannel had net-conf\.json.network matching our networking.podSubnet (kubeadm config view
). Changing these networks to match alleviated our networking woes. We were then able to remove hostNetwork: true
from our deployments.
You can start by checking if the dns is working
Run the nslookup on kubernetes.default from inside the pod, check if it is working.
[root@metrics-master-2 /]# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
Check the local dns configuration inside the pods:
[root@metrics-master-2 /]# cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5
At last, check the kube-dns container logs while you run the ping command, It will give you the possible reasons why the name is not resolving.
kubectl logs kube-dns-86f4d74b45-7c4ng -c kubedns -n kube-system
Hope this helps.