How does DNS resolution work on Kubernetes with multiple networks?

9/11/2018

I have a 4 node Kubernetes cluster, 1 x controller and 3 x workers. The following shows how they are configured with the versions.

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-ctrl-1 Ready master 1h v1.11.2 192.168.191.100 <none> Ubuntu 18.04.1 LTS 4.15.0-1021-aws docker://18.6.1 turtle-host-01 Ready <none> 1h v1.11.2 192.168.191.53 <none> Ubuntu 18.04.1 LTS 4.15.0-29-generic docker://18.6.1 turtle-host-02 Ready <none> 1h v1.11.2 192.168.191.2 <none> Ubuntu 18.04.1 LTS 4.15.0-34-generic docker://18.6.1 turtle-host-03 Ready <none> 1h v1.11.2 192.168.191.3 <none> Ubuntu 18.04.1 LTS 4.15.0-33-generic docker://18.6.1

Each of the nodes has two network interfaces, for arguments sake eth0 and eth1. eth1 is the network that I want to the cluster to work on. I setup the controller using kubeadm init and passed --api-advertise-address 192.168.191.100. The worker nodes where then joined using this address.

Finally on each node I modified the kubelet service to have --node-ip set so that the layout looks as above.

The cluster appears to be working correctly and I can create pods, deployments etc. However the issue I have is that none of the pods are able to use the kube-dns service for DNS resolution.

This is not a problem with resolution, rather that the machines cannot connect to the DNS service to perform the resolution. For example if I run a busybox container and access it to perform nslookup i get the following:

/ # nslookup www.google.co.uk nslookup: read: Connection refused nslookup: write to '10.96.0.10': Connection refused

I have a feeling that this is down to not using the default network and because of that I suspect some Iptables rules are not correct, that being said these are just guesses.

I have tried both the Flannel overlay and now Weave net. The pod CIDR range is 10.32.0.0/16 and the service CIDR is as default.

I have noticed that with Kubernetes 1.11 there are now pods called coredns rather than one kube-dns.

I hope that this is a good place to ask this question. I am sure I am missing something small but vital so if anyone has any ideas that would be most welcome.

Update #1:

I should have said that the nodes are not all in the same place. I have a VPN running between them all and this is the network I want things to communicate over. It is an idea I had to try and have distributed nodes.

Update #2:

I saw another answer on SO (DNS in Kubernetes not working) that suggested kubelet needed to have --cluster-dns and --cluster-domain set. This is indeed the case on my DEV K8s cluster that I have running at home (on one network).

However it is not the case on this cluster and I suspect this is down to a later version. I did add the two settings to all nodes in the cluster, but it did not make things work.

Update #3

The topology of the cluster is as follows.

  • 1 x Controller is in AWS
  • 1 x Worker is in Azure
  • 2 x Worker are physical machines in a colo Data Centre

All machines are connected to each other using ZeroTier VPN on the 192.168.191.0/24 network.

I have not configured any special routing. I agree that this is probably where the issue is, but I am not 100% sure what this routing should be.

WRT to kube-dns and nginx, I have not tainted my controller so nginx is not on the master, not is busybox. nginx and busybox are on workers 1 and 2 respectively.

I have used netcat to test connection to kube-dns and I get the following:

/ # nc -vv 10.96.0.10 53 nc: 10.96.0.10 (10.96.0.10:53): Connection refused sent 0, rcvd 0 / # nc -uvv 10.96.0.10 53 10.96.0.10 (10.96.0.10:53) open

The UDP connection does not complete.

I modified my setup so that I could run containers on the controller, so kube-dns, nginx and busybox are all on the controller, and I am able to connect and resolve DNS queries against 10.96.0.10.

So all this does point to routing or IPTables IMHO, I just need to work out what that should be.

Update #4

In response to comments I can confirm the following ping test results.

Master -> Azure Worker (Internet)  : SUCCESS : Traceroute SUCCESS
Master -> Azure Worker (VPN)       : SUCCESS : Traceroute SUCCESS
Azure Worker -> Master (Internet)  : SUCCESS : Traceroute FAIL (too many hops)
Azure Worker -> Master (VPN)       : SUCCESS : Traceroute SUCCESS

Master -> Colo Worker 1 (Internet) : SUCCESS : Traceroute SUCCESS
Master -> Colo Worker 1 (VPN)      : SUCCESS : Traceroute SUCCESS
Colo Worker 1 -> Master (Internet) : SUCCESS : Traceroute FAIL (too many hops)
Colo Worker 1 -> Master (VPN)      : SUCCESS : Traceroute SUCCESS

Update 5

After running the tests above, it got me thinking about routing and I wondered if it was as simple as providing a route to the controller over the VPN for the service CIDR range (10.96.0.0/12).

So on a host, not included in the cluster, I added a route thus:

route add -net 10.96.0.0/12 gw 192.168.191.100

And I could then resolve DNS using the kube-dns server address:

nslookup www.google.co.uk 10.96.0.10

SO I then added a route, as above, to one of the worker nodes and tried the same. But it is blocked and I do not get a response. Given that I can resolve DNS over the VPN with the appropriate route from a non-kubernetes machine, I can only think that there is an IPTables rule that needs updating or adding.

I think this is almost there, just one last bit to fix.

I realise this is wrong as it it the kube-proxy should do the DNS resolution on each host. I am leaving it here for information.

-- Russell Seymour
kube-dns
kubernetes

2 Answers

9/11/2018

Sounds like you are running on AWS. I suspect that your AWS security group is not allowing DNS traffic to go through. You can try allowing all traffic to the Security Group(s) where all your master and nodes are, to see if that's the problem.

sg

You can also check that all your masters and nodes are allowing routing:

cat /proc/sys/net/ipv4/ip_forward

If not

echo 1 > /proc/sys/net/ipv4/ip_forward

Hope it helps.

-- Rico
Source: StackOverflow

9/11/2018

Following the instruction at this page, try to run this:

apiVersion: v1
kind: Pod
metadata:
  namespace: default
  name: dns-example
spec:
  containers:
    - name: test
      image: nginx
  dnsPolicy: "None"
  dnsConfig:
    nameservers:
      - 1.2.3.4
    searches:
      - ns1.svc.cluster.local
      - my.dns.search.suffix
    options:
      - name: ndots
        value: "2"
      - name: edns0

and see if a manual configuration works or you have some networking DNS problem.

-- Nicola Ben
Source: StackOverflow