Organizing code repositories for a Airflow cloud deployment

3/17/2021

For a production Airflow Kubernetes deployment, what would be the recommended code repositories? Should DAG definition and DAG business logic be split into separate repos?

I'm guessing that it's best practice to separate the DAG configuration, DAG business logic/dependencies, and Airflow container. Consequently, I'm envisioning the following:

  • A repo containing the Airflow Docker image
  • A repo containing the Airflow DAG definitions
  • A repo containing the Airflow DAG Tasks (business logic)

In this setup the Airflow DAGs would be git-synced onto the pod or accessible via a PVC. The actual business logic for each task in that workflow would be in a separate repo that contains a Dockerfile. That repo would get spun up as a side-car container on the worker pod that the individual DAG tasks would then make entry-point/executable calls to.

I'm curious how other devs are structuring their repos to be well suited for deploying Airflow on the cloud.

-- Nick Falco
airflow
git
kubernetes
repo

0 Answers