I have an OpenShift, cluster, and periodically when accessing logs, I get:
worker1-sass-on-prem-origin-3-10 on 10.1.176.130:53: no such host" kube doing a connection to 53 on a node.
I also tend to see tcp: lookup postgres.myapp.svc.cluster.local on 10.1.176.136:53: no such host
errors from time to time in pods, again, this makes me think that, when accessing internal service endpoints, pods, clients, and other Kubernetes related services actually talk to a DNS server that is assumed to be running on the given node that said pods are running on.
Looking into one of my pods on a given node, I found the following in resolv.conf (I had to ssh and run docker exec
to get this output - since oc exec isn't working due to this issue).
/etc/cfssl $ cat /etc/resolv.conf
nameserver 10.1.176.129
search jim-emea-test.svc.cluster.local svc.cluster.local cluster.local bds-ad.lc opssight.internal
options ndots:5
Thus, it appears that in my cluster, containers have a self-referential resolv.conf entry. This cluster is created with openshift-ansible. I'm not sure if this is infra-specific, or if its actually a fundamental aspect of how openshift nodes work, but i suspect the latter, as I haven't done any major customizations to my ansible workflow from the upstream openshift-ansible recipes.
It does appear that its normal for an openshift ansible deployment to deploy dnsmasq
services on every node.
As an example of how this can effect things, the following https://github.com/openshift/openshift-ansible/pull/8187 is instructive. In any case, if a local node's dnsmasq is acting flakey for any reason, it will prevent containers running on that node from properly resolving addresses of other containers in a cluster.
After checking on an individual node, I found that in fact, there was a process indeed bounded to port 53, and it is dnsmasq. Hence,
[enguser@worker0-sass-on-prem-origin-3-10 ~]$ sudo netstat -tupln | grep 53 tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 675/openshift
And, dnsmasq is running locally:
[enguser@worker0-sass-on-prem-origin-3-10 ~]$ ps -ax | grep dnsmasq 4968 pts/0 S+ 0:00 grep --color=auto dnsmasq 6994 ? Ss 0:22 /usr/sbin/dnsmasq -k [enguser@worker0-sass-on-prem-origin-3-10 ~]$ sudo ps -ax | grep dnsmasq 4976 pts/0 S+ 0:00 grep --color=auto dnsmasq 6994 ? Ss 0:22 /usr/sbin/dnsmasq -k
The final clue, resolv.conf itself is even adding the local IP address as a nameserver... And this is obviously borrowed into containers that start.
nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
Generated by NetworkManager
search cluster.local bds-ad.lc opssight.internal
NOTE: the libc resolver may not support more than 3 nameservers.
The nameservers listed below may not be recognized.
nameserver 10.1.176.129
In my case , this was happening because the local nameserver was using an ifcfg
(you can see these files in /etc/sysconfig/network-scripts/) with
[enguser@worker0-sass-on-prem-origin-3-10 network-scripts]$ cat ifcfg-ens192
TYPE=Ethernet
BOOTPROTO=dhcp
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens192
UUID=50936212-cb5e-41ff-bec8-45b72b014c8c
DEVICE=ens192
ONBOOT=yes
However, my internally configured Virtual Machines could not resolve IPs provided to them by the PEERDNS records.
Ultimately the fix was to work with our IT department to make sure our authoritative domain for our kube clusters had access to all IP addresses in our data center.
If youre seeing the :53 record errors are coming up when you try to kubectl or oc logs / exec, then there is likely that your apiserver is not able to connect with kubelets via their IP address.
If youre seeing :53 record errors in other places, for example, inside of pods, then this is because your pod, using its own local DNS, isnt able to resolve internal cluster IP addresses. This might simply be because you have an outdated controller that is looking for services that don't exist anymore, or else, you have flakiness at your kubernetes dns implementation level.