How to reference a DAG's execution date inside of a `KubernetesPodOperator`?

4/22/2019

I am writing an Airflow DAG to pull data from an API and store it in a database I own. Following best practices outlined in We're All Using Airflow Wrong, I'm writing the DAG as a sequence of KubernetesPodOperators that run pretty simple Python functions as the entry point to the Docker image.

The problem I'm trying to solve is that this DAG should only pull data for the execution_date.

If I was using a PythonOperator (doc), I could use the provide_context argument to make the execution date available to the function. But judging from the KubernetesPodOperator's documentation, it seems that the Kubernetes operator has no argument that does what provide_context does.

My best guess is that you could use the arguments command to pass in a date range, and since it's templated, you can reference it like this:

my_pod_operator = KubernetesPodOperator(
    # ... other args here
    arguments=['python', 'my_script.py', '{{ ds }}'],
    # arguments continue
)

And then you'd get the start date like you'd get any other argument provided to a Python file run as a script, by using sys.argv.

Is this the right way of doing it?

Thanks for the help.

-- Eric Fulmer
airflow
kubernetes

1 Answer

4/23/2019

Yes, that is the correct way of doing it.

Each Operator would have template_fields. All the parameters listed in template_fields can render Jinja2 templates and Airflow Macros.

For KubernetesPodOperator, if you check docs, you would find:

template_fields = ['cmds', 'arguments', 'env_vars', 'config_file']

which means you can pass '{{ ds }}'to any of the four params listed above.

-- kaxil
Source: StackOverflow