I currently have a k8s
cluster setup on my Ubuntu machine using kubeadm
. For the CNI, I am using calico
. I am debugging the following DNS issue (I've seen numerous posts for this):
[ERROR] plugin/errors: 2 kubernetes.default. A: read udp 192.168.83.69:59301->172.16.5.2:53: i/o timeout
First of all, I have the following test pod
configuration:
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: default
spec:
containers:
- name: dnsutils
image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
On this pod, dnsutils
, DNS resolution seems to work as expected:
$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
Here is a tshark
dump on the calico
interface (which of course, looks normal):
10.745406511 192.168.83.77 → 10.96.0.10 DNS 104 Standard query 0x29d6 A kubernetes.default.default.svc.cluster.local
18 10.745625610 10.96.0.10 → 192.168.83.77 DNS 197 Standard query response 0x29d6 No such name A kubernetes.default.default.svc.cluster.local SOA ns.dns.cluster.local
19 10.745902344 192.168.83.77 → 10.96.0.10 DNS 96 Standard query 0x1dda A kubernetes.default.svc.cluster.local
20 10.746111103 10.96.0.10 → 192.168.83.77 DNS 148 Standard query response 0x1dda A kubernetes.default.svc.cluster.local A 10.96.0.1
21 10.746373190 192.168.83.77 → 10.96.0.10 DNS 96 Standard query 0x5a2c AAAA kubernetes.default.svc.cluster.local
22 10.746537515 10.96.0.10 → 192.168.83.77 DNS 189 Standard query response 0x5a2c AAAA kubernetes.default.svc.cluster.local SOA ns.dns.cluster.local
Now, I proceeded to try the same thing with a random busybox
image:
kubectl run -i --tty --rm debug --image=busybox --restart=Never -- sh
However, with this busybox
image, the same nslookup
does not work:
# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10:53
** server can't find kubernetes.default: NXDOMAIN
*** Can't find kubernetes.default: No answer
Here is the corresponding tshark
output:
1 0.000000000 192.168.83.80 → 10.96.0.10 DNS 78 Standard query 0x0700 A kubernetes.default
2 0.000057953 192.168.83.80 → 10.96.0.10 DNS 78 Standard query 0x0700 AAAA kubernetes.default
3 0.047930496 10.96.0.10 → 192.168.83.80 DNS 153 Standard query response 0x0700 No such name AAAA kubernetes.default SOA a.root-servers.net
4 2.502185605 192.168.83.80 → 10.96.0.10 DNS 78 Standard query 0x0700 AAAA kubernetes.default
5 2.502622008 10.96.0.10 → 192.168.83.80 DNS 153 Standard query response 0x0700 No such name AAAA kubernetes.default SOA a.root-servers.net
The main difference I see is that for the working example, it makes the proper request for the full domain, kubernetes.default.default.svc.cluster.local
, while in the second example, the request is seen for just kubernetes.default
.
The /etc/resolv.conf
looks identical for both images. Is there anything else that could be affecting this?