kubernetes pod kube-dns keeps restarting

8/12/2019

I setup a k8s cluster with one node, and found that the kube-dns pod keeps restarting:

$ kubectl -n kube-system get pods
NAME                                       READY     STATUS    RESTARTS   AGE
kube-dns-1806975333-xjbgr                  2/3       CrashLoopBackOff   74         6h

or

kube-dns-1806975333-xjbgr                  3/3       Running   106        9h
...

when the READY is 3/3, everything works well, but it keeps restarting at the speed of about 10 times per hour.

And I googled around and found several answers to this issue, such as kubernetes DNS fails, but they don't apply to me. the file on my host is as below, and it looks good.

$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.100.0.10
nameserver 192.168.200.1

$ kubectl -n kube-system get service -o wide
NAME                   CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE       SELECTOR
kube-dns               10.100.0.10     <none>        53/UDP,53/TCP   10h       k8s-app=kube-dns

and the logs show that 'Maximum number of concurrent DNS queries reached':

$ kk logs  kube-dns-1806975333-xjbgr -c dnsmasq
I0812 10:44:54.206829    2393 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I0812 10:44:54.206959    2393 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
I0812 10:44:54.301015    2393 nanny.go:111]
W0812 10:44:54.301050    2393 nanny.go:112] Got EOF from stdout
I0812 10:44:54.301027    2393 nanny.go:108] dnsmasq[2412]: started, version 2.76 cachesize 1000
I0812 10:44:54.301071    2393 nanny.go:108] dnsmasq[2412]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0812 10:44:54.301088    2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0812 10:44:54.301093    2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0812 10:44:54.301096    2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0812 10:44:54.301100    2393 nanny.go:108] dnsmasq[2412]: reading /etc/resolv.conf
I0812 10:44:54.301103    2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0812 10:44:54.301120    2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0812 10:44:54.301123    2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0812 10:44:54.301127    2393 nanny.go:108] dnsmasq[2412]: using nameserver 10.100.0.10#53
I0812 10:44:54.301134    2393 nanny.go:108] dnsmasq[2412]: using nameserver 192.168.200.1#53
I0812 10:44:54.301138    2393 nanny.go:108] dnsmasq[2412]: read /etc/hosts - 7 addresses
I0812 10:44:55.207448    2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:05.227722    2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:15.243378    2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:25.259829    2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:35.272106    2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:45.293486    2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:55.316141    2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:46:05.336765    2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)

My env:

$ uname -a
Linux cloudland-master-1 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3", GitCommit:"2c2fe6e8278a5db2d15a013987b53968c743f2a1", GitTreeState:"clean", BuildDate:"2017-08-03T07:00:21Z",GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3", GitCommit:"2c2fe6e8278a5db2d15a013987b53968c743f2a1", GitTreeState:"clean", BuildDate:"2017-08-03T06:43:48Z",GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Please help me out of there.

-- Tan Jinfu
dns
dnsmasq
kubernetes

1 Answer

8/13/2019

It turns out the reason is that the originally configured dns server IP on the node doesn't provide dns service. If changed to a correct one the symptom disappears. It seams that the dnsmasq lookup external domain names from the IP but failed, then it get killed. There is no logs about it, just found it by chance. Please comment about it if you know the reason behind it.

-- Tan Jinfu
Source: StackOverflow