Unable to resolve internal service from a pod

8/16/2017

I have a Kubernetes Cluster (v1.7.3) setup behind corporate network. Everything looks good except that pods are unable to resolve other service names. And the pod can only talk to the Host OS but not the other hosts in the cluster.

Here is the output to list all the pods in the cluster. The system pods are running well as expected.

[user@xxxxxx ~]$ kubectl get po -o wide -n kube-system 
NAME                                    READY     STATUS    RESTARTS   AGE       IP              NODE
etcd-loadbalancer                       1/1       Running   2          7d        192.168.1.102   loadbalancer
kube-apiserver-loadbalancer             1/1       Running   2          7d        192.168.1.102   loadbalancer
kube-controller-manager-loadbalancer    1/1       Running   2          7d        192.168.1.102   loadbalancer
kube-dns-2425271678-gr0fc               3/3       Running   6          7d        10.244.0.7      loadbalancer
kube-flannel-ds-4pr1s                   2/2       Running   3          5d        192.168.1.103   gateway1
kube-flannel-ds-5zrmx                   2/2       Running   1          38m       192.168.1.101   gateway2
kube-flannel-ds-cb3ng                   2/2       Running   6          7d        192.168.1.102   loadbalancer
kube-flannel-ds-g3fgn                   2/2       Running   0          38m       192.168.1.104   gateway3
kube-proxy-ck2mb                        1/1       Running   2          7d        192.168.1.102   loadbalancer
kube-proxy-gvfbp                        1/1       Running   1          5d        192.168.1.103   gateway1
kube-proxy-w0k1k                        1/1       Running   0          38m       192.168.1.104   gateway3
kube-proxy-w2h9b                        1/1       Running   0          38m       192.168.1.101   gateway2
kube-scheduler-loadbalancer             1/1       Running   2          7d        192.168.1.102   loadbalancer
kubernetes-dashboard-3313488171-pbsjj   1/1       Running   2          6d        10.244.0.8      loadbalancer

The troubleshooting output from a test pod:

root@test-1425111236-dht4w:/# nslookup kubernetes.default
;; connection timed out; no servers could be reached

The resolve conf on the pod (The pod is running in a new namespace) :

root@test-1425111236-dht4w:/# cat /etc/resolv.conf
nameserver 10.96.0.10
search <new-namespace>.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

Updates: add more info

Service status:

[user@xxxxx ~]$ kubectl describe svc -n kube-system kube-dns
Name:                   kube-dns
Namespace:              kube-system
Labels:                 k8s-app=kube-dns
                        kubernetes.io/cluster-service=true
                        kubernetes.io/name=KubeDNS
Annotations:            <none>
Selector:               k8s-app=kube-dns
Type:                   ClusterIP
IP:                     10.96.0.10
Port:                   dns     53/UDP
Endpoints:              10.244.0.10:53
Port:                   dns-tcp 53/TCP
Endpoints:              10.244.0.10:53
Session Affinity:       None
Events:                 <none>

The error log from the kube-flannel pod:

I0816 00:43:39.605812       1 main.go:446] Determining IP address of default interface
I0816 00:43:39.609627       1 main.go:459] Using interface with name enp3s0 and address 192.168.1.103
I0816 00:43:39.609778       1 main.go:476] Defaulting external address to interface address (192.168.1.103)
I0816 00:43:39.693979       1 kube.go:130] Waiting 10m0s for node controller to sync
I0816 00:43:39.694156       1 kube.go:283] Starting kube subnet manager
I0816 00:43:40.694888       1 kube.go:137] Node controller sync successful
I0816 00:43:40.695057       1 main.go:226] Created subnet manager: Kubernetes Subnet Manager - gateway1
I0816 00:43:40.695187       1 main.go:229] Installing signal handlers
I0816 00:43:40.695539       1 main.go:330] Found network config - Backend type: vxlan
I0816 00:43:40.781458       1 ipmasq.go:51] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
I0816 00:43:40.794618       1 ipmasq.go:51] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I0816 00:43:40.807611       1 ipmasq.go:51] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.1.0/24 -j RETURN
I0816 00:43:40.828642       1 ipmasq.go:51] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE
I0816 00:43:40.844208       1 main.go:279] Wrote subnet file to /run/flannel/subnet.env
I0816 00:43:40.844382       1 main.go:284] Finished starting backend.
I0816 00:43:40.844559       1 vxlan_network.go:56] Watching for L3 misses
I0816 00:43:40.844664       1 vxlan_network.go:64] Watching for new subnet leases
E0816 02:38:53.404701       1 reflector.go:304] github.com/coreos/flannel/subnet/kube/kube.go:284: Failed to watch *v1.Node: Get https://10.96.0.1:443/api/v1/nodes?resourceVersion=630658&timeoutSeconds=395&watch=true: unexpected EOF
E0816 02:38:54.408833       1 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:284: Failed to list *v1.Node: Get https://10.96.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: connection refused 

The question is what the possible cause is and where I should start from? Many Thanks

-- ichbinblau
dns
kubernetes
networking

0 Answers