I've got a two node Kubernetes EKS cluster which is running "v1.12.6-eks-d69f1"
Amazon VPC CNI Plugin for Kubernetes version: amazon-k8s-cni:v1.4.1
CoreDNS version: v1.1.3
KubeProxy: v1.12.6
There are two CoreDNS pods running on the cluster.
The problem I have is that my pods are resolving internal DNS names intermittently. (Resolution of external DNS names work just fine)
root@examplecontainer:/# curl http://elasticsearch-dev.internaldomain.local:9200/
curl: (6) Could not resolve host: elasticsearch-dev.internaldomain.local
elasticsearch-dev.internaldomain.local is registered on an AWS Route53 Internal Hosted Zone. The above works intermittenly, if I fire five requests, two of them would resolve correctly and the rest would fail.
These are the contents of the /etc/resolv.conf file on the examplecontainer above:
root@examplecontainer:/# cat /etc/resolv.conf
nameserver 172.20.0.10
search default.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal
options ndots:5
Any ideas why this might be happening?
you should try below dns from container
curl http://elasticsearch-dev.default.svc.cluster.local:9200/
pleae take a look for this "Enabling DNS resolution for Amazon EKS cluster endpoints" here.
The Amazon Route 53 private hosted zone that is created for the endpoint is only associated with the worker node VPC.
If it's similar toy your env. you can find solution here.
Please share with the results.
I fixed this issue by switching from a custom "DHCP option set" to the default "DHCP option set" provided by AWS. I created the custom "DHCP option set" months ago and assigned it to the VPC where the EKS cluster is running...
How did I get to the bottom of this?
After running "kubectl get events -n kube-system", I realised of the following:
Warning DNSConfigForming 17s (x15 over 14m) kubelet, ip-10-4-9-155.us-west-1.compute.internal Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.4.8.2 8.8.8.8 8.8.4.4
8.8.8.8 and 8.8.4.4 were injected by the troublesome "DHCP options set" that I created. And I think that the reason why my services where resolving internal DNS names intermittently was because the CoreDNS service was internally forwarding DNS requests to 10.4.8.2, 8.8.4.4, 8.8.8.8 in a round robin fashion. Since the last 2 DNS servers don't know about my Route53 internal hosted zone DNS records, the resolution failed intermittently.
Note 10.4.8.2 is the default AWS nameserver.
As soon as switch to the default "DHCP option set" provided by AWS, the EKS services can resolve my internal DNS names consistently.
I hope this will help someone in the future.