Airflow too many DNS lookups for database

11/26/2020

We have an Apache Airflow deployed over a K8s cluster in AWS. Airflow is running on containers but the EC2 instances themselves are reserved instances.

We are experiencing an issue where we see that Ariflow is making many DNS queries related to it's DB. When at rest (i.e. no DAGs are running) it's about 10 per second. When running several DAGs it can go up to 50 per second. This results in Route53 blocking us since we are hitting the packet limit for DNS queries (1024 packets per second).

Our DB is a Postgres RDS, and when switching it to a MySQL the issue remains.

The way we understand it, the DNS query starts at K8s coredns service, which tries several permutations of the FQDN and sends the requests to Route53 if it can't resolve it on it's own.

Any ideas, thoughts, or hints to explain Airflow's behavior or how to reduce the number of queries is most welcome.

Best,

-- Guy Grin
airflow
amazon-route53
kubernetes

2 Answers

11/30/2020

After some digging we found we had several issues happening at the same time.

The first being that Airflow's scheduler was running about 2 times per second. Each time it created DB queries which resulted in several DNS queries. Changing that scheduling alleviated some of the issue.

Another issue we had is described here. It looks like coredns is configured to try some alternatives of the given domain if it has less than x number of . in the FQDN. There are 2 suggested fixes in that article. We followed them through and the number of DNS queries dropped.

-- Guy Grin
Source: StackOverflow

2/5/2021

we have been having this issue too.

wasn't the easiest to find as we had one box with lots of apps on it making 1000s of DNS queries requesting DNS resolution of our SQL server name.

i really wonder why Airflow doesnt just use the DNS cache like every other application

-- g0pher
Source: StackOverflow