Mounting folders with KubernetesPodOperator on Google Composer/Airflow

10/12/2021

I am trying to mount dags folder to be able to run a python script inside of the KubernetesPodOperator on Aiflow, but can't figure out how to do it. In production I would like to do it in Google Composer. Here is my task:

kubernetes_min_pod = KubernetesPodOperator(
    task_id='pod-ex-minimum',
    cmds=["bash", "-c"],
    arguments=["cd /usr/local/tmp"],
    namespace='default',
    image='toru2220/scrapy-chrome:latest',
    is_delete_operator_pod=True,
    get_logs=True,
    in_cluster=False,
    volumes=[
        Volume("my-volume", {"persistentVolumeClaim": {"claimName": "my-volume"}})
    ],
    volume_mounts=[
        VolumeMount("my-volume", "/usr/local/tmp", sub_path=None, read_only=False)
    ],
)

I am trying to understand what is the easiest way to mount the current folder where dag is?

-- Arthur Zangiev
airflow
google-cloud-composer
kubernetes

1 Answer

10/18/2021

As per this doc, when you create an environment, Cloud Composer creates a Cloud Storage bucket and associates the bucket with your environment. The name of the bucket is based on the environment region, name, and a random ID such as “us-central1-b1-6efannnn-bucket”. Cloud Composer stores the source code for your workflows (DAGs) and their dependencies in specific folders in Cloud Storage and uses Cloud Storage FUSE to map the folders to the Airflow instances in your Cloud Composer environment.

The Cloud Composer runs on top of a GKE cluster with all the DAGs, tasks, and services running on a single node pool. As per your requirement, you are trying to mount a DAGs folder in your code, which is already mounted in the Airflow pods under “/home/airflow/gcs/dags” path. Please refer to this doc for more information about KubernetesPodOperator in Cloud Composer.

-- Vishal K
Source: StackOverflow