`kube-dns` gets wrong endpoint address with flannel causes POD DNS service failed

6/3/2018

I setup a 3 nodes kubernetes (v1.9.3) cluster on Ubuntu 16.04.

Prior setup I cleared the iptables rules and follow k8s documents for flannel with following command to initialize the cluster:

# kubeadm init --apiserver-advertise-address 192.168.56.20 --pod-network-cidr=10.244.0.0/16 --kubernetes-version 1.9.3
# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml

The previous command seemed successful:

# kubectl -n kube-system -n kube-system get pods
NAME                             READY     STATUS    RESTARTS   AGE
etcd-master                      1/1       Running   0          3m
kube-apiserver-master            1/1       Running   0          2m
kube-controller-manager-master   1/1       Running   0          2m
kube-dns-6f4fd4bdf-4c76v         3/3       Running   0          3m
kube-flannel-ds-wbx97            1/1       Running   0          1m
kube-proxy-x65lv                 1/1       Running   0          3m
kube-scheduler-master            1/1       Running   0          2m 

But the problem is kube-dns seems got wrong service endpoint address assigned, this can be seen with following commands:

# kubectl get ep kube-dns --namespace=kube-system            
NAME       ENDPOINTS                     AGE
kube-dns   172.17.0.2:53,172.17.0.2:53   3m
root@master:~# kubectl describe service kube-dns -n kube-system           
Name:              kube-dns
Namespace:         kube-system
Labels:            k8s-app=kube-dns
                   kubernetes.io/cluster-service=true
                   kubernetes.io/name=KubeDNS
Annotations:       <none>
Selector:          k8s-app=kube-dns
Type:              ClusterIP
IP:                10.96.0.10
Port:              dns  53/UDP
TargetPort:        53/UDP
Endpoints:         172.17.0.2:53
Port:              dns-tcp  53/TCP
TargetPort:        53/TCP
Endpoints:         172.17.0.2:53
Session Affinity:  None
Events:            <none>

The 172.17.0.2 is the IP address assigned by docker bridge (docker0) for kube-dns container. On working k8s network setup, the kube-dns should have endpoints with address from podSubnet (10.244.0.0/16).

The effect of current setup is all the pods will not have functioned DNS while IP communication is ok.

I tried to delete kube-dns pod to see the new kube-dns containers can pick up the endpoints from podSubnet but they don't.

From the startup logs of 3 kube-dns containers, there are no ANY error messages.

-- robert
kube-dns
kubernetes

1 Answer

6/12/2018

I think I have found out the root cause for this. It is the previous kubeadm reset did not remove both cni and flannel.1 interfaces. So the next kubeadm init makes kube-dns believes the Kubernetes network plugin is already in place before I apply the flannel yaml.

After I check and remove any virtual NICs created by flannel plugin when tearing down kubernetes cluster, the next kubeadm init can succeed without this issue any more.

The same thing applies to Weave Net that requires to run weave reset to remove remained virtual weave NICs.

-- robert
Source: StackOverflow