I have a 4 node Kubernetes cluster, 1 x controller and 3 x workers. The following shows how they are configured with the versions.
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-ctrl-1 Ready master 1h v1.11.2 192.168.191.100 <none> Ubuntu 18.04.1 LTS 4.15.0-1021-aws docker://18.6.1 turtle-host-01 Ready <none> 1h v1.11.2 192.168.191.53 <none> Ubuntu 18.04.1 LTS 4.15.0-29-generic docker://18.6.1 turtle-host-02 Ready <none> 1h v1.11.2 192.168.191.2 <none> Ubuntu 18.04.1 LTS 4.15.0-34-generic docker://18.6.1 turtle-host-03 Ready <none> 1h v1.11.2 192.168.191.3 <none> Ubuntu 18.04.1 LTS 4.15.0-33-generic docker://18.6.1
Each of the nodes has two network interfaces, for arguments sake eth0
and eth1
. eth1
is the network that I want to the cluster to work on. I setup the controller using kubeadm init
and passed --api-advertise-address 192.168.191.100
. The worker nodes where then joined using this address.
Finally on each node I modified the kubelet service to have --node-ip
set so that the layout looks as above.
The cluster appears to be working correctly and I can create pods, deployments etc. However the issue I have is that none of the pods are able to use the kube-dns
service for DNS resolution.
This is not a problem with resolution, rather that the machines cannot connect to the DNS service to perform the resolution. For example if I run a busybox
container and access it to perform nslookup
i get the following:
/ # nslookup www.google.co.uk nslookup: read: Connection refused nslookup: write to '10.96.0.10': Connection refused
I have a feeling that this is down to not using the default network and because of that I suspect some Iptables rules are not correct, that being said these are just guesses.
I have tried both the Flannel overlay and now Weave net. The pod CIDR range is 10.32.0.0/16
and the service CIDR is as default.
I have noticed that with Kubernetes 1.11 there are now pods called coredns
rather than one kube-dns
.
I hope that this is a good place to ask this question. I am sure I am missing something small but vital so if anyone has any ideas that would be most welcome.
Update #1:
I should have said that the nodes are not all in the same place. I have a VPN running between them all and this is the network I want things to communicate over. It is an idea I had to try and have distributed nodes.
Update #2:
I saw another answer on SO (DNS in Kubernetes not working) that suggested kubelet
needed to have --cluster-dns
and --cluster-domain
set. This is indeed the case on my DEV K8s cluster that I have running at home (on one network).
However it is not the case on this cluster and I suspect this is down to a later version. I did add the two settings to all nodes in the cluster, but it did not make things work.
Update #3
The topology of the cluster is as follows.
All machines are connected to each other using ZeroTier VPN on the 192.168.191.0/24 network.
I have not configured any special routing. I agree that this is probably where the issue is, but I am not 100% sure what this routing should be.
WRT to kube-dns
and nginx
, I have not tainted my controller so nginx
is not on the master, not is busybox
. nginx
and busybox
are on workers 1 and 2 respectively.
I have used netcat
to test connection to kube-dns
and I get the following:
/ # nc -vv 10.96.0.10 53 nc: 10.96.0.10 (10.96.0.10:53): Connection refused sent 0, rcvd 0 / # nc -uvv 10.96.0.10 53 10.96.0.10 (10.96.0.10:53) open
The UDP connection does not complete.
I modified my setup so that I could run containers on the controller, so kube-dns
, nginx
and busybox
are all on the controller, and I am able to connect and resolve DNS queries against 10.96.0.10.
So all this does point to routing or IPTables IMHO, I just need to work out what that should be.
Update #4
In response to comments I can confirm the following ping test results.
Master -> Azure Worker (Internet) : SUCCESS : Traceroute SUCCESS
Master -> Azure Worker (VPN) : SUCCESS : Traceroute SUCCESS
Azure Worker -> Master (Internet) : SUCCESS : Traceroute FAIL (too many hops)
Azure Worker -> Master (VPN) : SUCCESS : Traceroute SUCCESS
Master -> Colo Worker 1 (Internet) : SUCCESS : Traceroute SUCCESS
Master -> Colo Worker 1 (VPN) : SUCCESS : Traceroute SUCCESS
Colo Worker 1 -> Master (Internet) : SUCCESS : Traceroute FAIL (too many hops)
Colo Worker 1 -> Master (VPN) : SUCCESS : Traceroute SUCCESS
Update 5
After running the tests above, it got me thinking about routing and I wondered if it was as simple as providing a route to the controller over the VPN for the service CIDR range (10.96.0.0/12
).
So on a host, not included in the cluster, I added a route thus:
route add -net 10.96.0.0/12 gw 192.168.191.100
And I could then resolve DNS using the kube-dns
server address:
nslookup www.google.co.uk 10.96.0.10
SO I then added a route, as above, to one of the worker nodes and tried the same. But it is blocked and I do not get a response. Given that I can resolve DNS over the VPN with the appropriate route from a non-kubernetes machine, I can only think that there is an IPTables rule that needs updating or adding.
I think this is almost there, just one last bit to fix.
I realise this is wrong as it it the kube-proxy
should do the DNS resolution on each host. I am leaving it here for information.
Sounds like you are running on AWS. I suspect that your AWS security group is not allowing DNS traffic to go through. You can try allowing all traffic to the Security Group(s) where all your master and nodes are, to see if that's the problem.
You can also check that all your masters and nodes are allowing routing:
cat /proc/sys/net/ipv4/ip_forward
If not
echo 1 > /proc/sys/net/ipv4/ip_forward
Hope it helps.
Following the instruction at this page, try to run this:
apiVersion: v1
kind: Pod
metadata:
namespace: default
name: dns-example
spec:
containers:
- name: test
image: nginx
dnsPolicy: "None"
dnsConfig:
nameservers:
- 1.2.3.4
searches:
- ns1.svc.cluster.local
- my.dns.search.suffix
options:
- name: ndots
value: "2"
- name: edns0
and see if a manual configuration works or you have some networking DNS problem.