I have a Kubernetes Cluster (v1.7.3) setup behind corporate network. Everything looks good except that pods are unable to resolve other service names. And the pod can only talk to the Host OS but not the other hosts in the cluster.
Here is the output to list all the pods in the cluster. The system pods are running well as expected.
[user@xxxxxx ~]$ kubectl get po -o wide -n kube-system
NAME READY STATUS RESTARTS AGE IP NODE
etcd-loadbalancer 1/1 Running 2 7d 192.168.1.102 loadbalancer
kube-apiserver-loadbalancer 1/1 Running 2 7d 192.168.1.102 loadbalancer
kube-controller-manager-loadbalancer 1/1 Running 2 7d 192.168.1.102 loadbalancer
kube-dns-2425271678-gr0fc 3/3 Running 6 7d 10.244.0.7 loadbalancer
kube-flannel-ds-4pr1s 2/2 Running 3 5d 192.168.1.103 gateway1
kube-flannel-ds-5zrmx 2/2 Running 1 38m 192.168.1.101 gateway2
kube-flannel-ds-cb3ng 2/2 Running 6 7d 192.168.1.102 loadbalancer
kube-flannel-ds-g3fgn 2/2 Running 0 38m 192.168.1.104 gateway3
kube-proxy-ck2mb 1/1 Running 2 7d 192.168.1.102 loadbalancer
kube-proxy-gvfbp 1/1 Running 1 5d 192.168.1.103 gateway1
kube-proxy-w0k1k 1/1 Running 0 38m 192.168.1.104 gateway3
kube-proxy-w2h9b 1/1 Running 0 38m 192.168.1.101 gateway2
kube-scheduler-loadbalancer 1/1 Running 2 7d 192.168.1.102 loadbalancer
kubernetes-dashboard-3313488171-pbsjj 1/1 Running 2 6d 10.244.0.8 loadbalancer
The troubleshooting output from a test pod:
root@test-1425111236-dht4w:/# nslookup kubernetes.default
;; connection timed out; no servers could be reached
The resolve conf on the pod (The pod is running in a new namespace) :
root@test-1425111236-dht4w:/# cat /etc/resolv.conf
nameserver 10.96.0.10
search <new-namespace>.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
Updates: add more info
Service status:
[user@xxxxx ~]$ kubectl describe svc -n kube-system kube-dns
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=KubeDNS
Annotations: <none>
Selector: k8s-app=kube-dns
Type: ClusterIP
IP: 10.96.0.10
Port: dns 53/UDP
Endpoints: 10.244.0.10:53
Port: dns-tcp 53/TCP
Endpoints: 10.244.0.10:53
Session Affinity: None
Events: <none>
The error log from the kube-flannel pod:
I0816 00:43:39.605812 1 main.go:446] Determining IP address of default interface
I0816 00:43:39.609627 1 main.go:459] Using interface with name enp3s0 and address 192.168.1.103
I0816 00:43:39.609778 1 main.go:476] Defaulting external address to interface address (192.168.1.103)
I0816 00:43:39.693979 1 kube.go:130] Waiting 10m0s for node controller to sync
I0816 00:43:39.694156 1 kube.go:283] Starting kube subnet manager
I0816 00:43:40.694888 1 kube.go:137] Node controller sync successful
I0816 00:43:40.695057 1 main.go:226] Created subnet manager: Kubernetes Subnet Manager - gateway1
I0816 00:43:40.695187 1 main.go:229] Installing signal handlers
I0816 00:43:40.695539 1 main.go:330] Found network config - Backend type: vxlan
I0816 00:43:40.781458 1 ipmasq.go:51] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
I0816 00:43:40.794618 1 ipmasq.go:51] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I0816 00:43:40.807611 1 ipmasq.go:51] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.1.0/24 -j RETURN
I0816 00:43:40.828642 1 ipmasq.go:51] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE
I0816 00:43:40.844208 1 main.go:279] Wrote subnet file to /run/flannel/subnet.env
I0816 00:43:40.844382 1 main.go:284] Finished starting backend.
I0816 00:43:40.844559 1 vxlan_network.go:56] Watching for L3 misses
I0816 00:43:40.844664 1 vxlan_network.go:64] Watching for new subnet leases
E0816 02:38:53.404701 1 reflector.go:304] github.com/coreos/flannel/subnet/kube/kube.go:284: Failed to watch *v1.Node: Get https://10.96.0.1:443/api/v1/nodes?resourceVersion=630658&timeoutSeconds=395&watch=true: unexpected EOF
E0816 02:38:54.408833 1 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:284: Failed to list *v1.Node: Get https://10.96.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: connection refused
The question is what the possible cause is and where I should start from? Many Thanks