DNS timeout on EKS

8/21/2021

I'm facing a strange issue with my EKS cluster.

At random intervals, I see DNS requests timing out in my clusters for various pods. Sometimes my pods cannot access rds instances due to timeout:

dial TCP: lookup myapp.zzzz.eu-west-1.rds.amazonaws.com on 172.20.0.10:53: no such host"

And sometimes I can't even get to resolve GitHub url :/

I saw that there was a race condition issue some time ago https://github.com/awslabs/amazon-eks-ami/issues/357 but it got fixed at some point. My resolv.conf file looks like this in one of my pod:

nameserver 172.20.0.10

search default.svc.cluster.local svc.cluster.local cluster.local eu-west-1.compute.internal

options ndots:5

I'm using the CNI Calico with the default configuration, same as CoreDNS. I dont see any timeout or error in my CoreDNS logs.

eks version: 1.21

ami:amazon-eks-node-1.21-v20210813

Could you guys point me to the right redirection? I dont really know where to look at the moment..

-- ELIZABETHHHHHH
amazon-eks
amazon-web-services
coredns
dns
kubernetes

1 Answer

8/27/2021

turned out its a calico bug, created a ticket for it https://github.com/projectcalico/calico/issues/4866, "solution" is to downgrade to v3.19.1

-- ELIZABETHHHHHH
Source: StackOverflow