Airflow - Kubernetes Executor : How to only mount only a directory of a Persistent Volume Claim that correspond to the run_id

11/26/2019

I am using airflow with kubernetes executor.

It works when I use executor_config to mount a PersistentVolumeClaim.

However, I would like only to mount a subPath that would be dynamic, something like this :

executor_config={
    "KubernetesExecutor":
    {"volumes": [
                {
                    "name": "workdir-volume",
                    "persistentVolumeClaim": {"claimName": "my-volume-claim"},
                },
            ],
     "volume_mounts": [
                {
                    "mountPath": "/app/workdir/",
                    "name": "workdir-volume",
                    "subPath": "{{ run_id }}_{{ ds }}"
                },
            ]}
},

It doesn’t work for two reasons :

  • executor_config is not in the template_fields. Therefore, I created a new operator which include executor_config.

  • my understanding is the render is only done after pod start because when I look at the rendered task from the dashboard, it is fine, but the mounted directory is not rendered

Does someone have an idea on how to do this?

-- Olivier Cazade
airflow
kubernetes
python

1 Answer

3/9/2020

That will not be doable AFAIK since the run_id or ds is only available for each DAG/task run. You need to handle this at your script by passing in parameters in the task definition. Example:

t1 = BashOperator(
        task_id='t1',
        bash_command="extract.py --path='{{run_id}}'")
...

Using the parameters, let the script create the subdirectories.

-- alltej
Source: StackOverflow