We have an Apache Airflow deployed over a K8s cluster in AWS. Airflow is running on containers but the EC2
instances themselves are reserved instances.
We are experiencing an issue where we see that Ariflow is making many DNS queries related to it's DB. When at rest (i.e. no DAGs are running) it's about 10 per second. When running several DAGs it can go up to 50 per second. This results in Route53
blocking us since we are hitting the packet limit for DNS queries (1024 packets per second).
Our DB is a Postgres
RDS, and when switching it to a MySQL
the issue remains.
The way we understand it, the DNS query starts at K8s coredns
service, which tries several permutations of the FQDN and sends the requests to Route53
if it can't resolve it on it's own.
Any ideas, thoughts, or hints to explain Airflow's behavior or how to reduce the number of queries is most welcome.
Best,
After some digging we found we had several issues happening at the same time.
The first being that Airflow's scheduler was running about 2 times per second. Each time it created DB queries which resulted in several DNS queries. Changing that scheduling alleviated some of the issue.
Another issue we had is described here. It looks like coredns
is configured to try some alternatives of the given domain if it has less than x
number of .
in the FQDN. There are 2 suggested fixes in that article. We followed them through and the number of DNS queries dropped.
we have been having this issue too.
wasn't the easiest to find as we had one box with lots of apps on it making 1000s of DNS queries requesting DNS resolution of our SQL server name.
i really wonder why Airflow doesnt just use the DNS cache like every other application