I setup a k8s cluster with one node, and found that the kube-dns pod keeps restarting:
$ kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
kube-dns-1806975333-xjbgr 2/3 CrashLoopBackOff 74 6h
or
kube-dns-1806975333-xjbgr 3/3 Running 106 9h
...
when the READY is 3/3, everything works well, but it keeps restarting at the speed of about 10 times per hour.
And I googled around and found several answers to this issue, such as kubernetes DNS fails, but they don't apply to me. the file on my host is as below, and it looks good.
$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.100.0.10
nameserver 192.168.200.1
$ kubectl -n kube-system get service -o wide
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kube-dns 10.100.0.10 <none> 53/UDP,53/TCP 10h k8s-app=kube-dns
and the logs show that 'Maximum number of concurrent DNS queries reached':
$ kk logs kube-dns-1806975333-xjbgr -c dnsmasq
I0812 10:44:54.206829 2393 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I0812 10:44:54.206959 2393 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
I0812 10:44:54.301015 2393 nanny.go:111]
W0812 10:44:54.301050 2393 nanny.go:112] Got EOF from stdout
I0812 10:44:54.301027 2393 nanny.go:108] dnsmasq[2412]: started, version 2.76 cachesize 1000
I0812 10:44:54.301071 2393 nanny.go:108] dnsmasq[2412]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0812 10:44:54.301088 2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0812 10:44:54.301093 2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0812 10:44:54.301096 2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0812 10:44:54.301100 2393 nanny.go:108] dnsmasq[2412]: reading /etc/resolv.conf
I0812 10:44:54.301103 2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0812 10:44:54.301120 2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0812 10:44:54.301123 2393 nanny.go:108] dnsmasq[2412]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0812 10:44:54.301127 2393 nanny.go:108] dnsmasq[2412]: using nameserver 10.100.0.10#53
I0812 10:44:54.301134 2393 nanny.go:108] dnsmasq[2412]: using nameserver 192.168.200.1#53
I0812 10:44:54.301138 2393 nanny.go:108] dnsmasq[2412]: read /etc/hosts - 7 addresses
I0812 10:44:55.207448 2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:05.227722 2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:15.243378 2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:25.259829 2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:35.272106 2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:45.293486 2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:45:55.316141 2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
I0812 10:46:05.336765 2393 nanny.go:108] dnsmasq[2412]: Maximum number of concurrent DNS queries reached (max: 150)
My env:
$ uname -a
Linux cloudland-master-1 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3", GitCommit:"2c2fe6e8278a5db2d15a013987b53968c743f2a1", GitTreeState:"clean", BuildDate:"2017-08-03T07:00:21Z",GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3", GitCommit:"2c2fe6e8278a5db2d15a013987b53968c743f2a1", GitTreeState:"clean", BuildDate:"2017-08-03T06:43:48Z",GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Please help me out of there.
It turns out the reason is that the originally configured dns server IP on the node doesn't provide dns service. If changed to a correct one the symptom disappears. It seams that the dnsmasq lookup external domain names from the IP but failed, then it get killed. There is no logs about it, just found it by chance. Please comment about it if you know the reason behind it.