Is there a specific way to install python packages on pods?

4/11/2019

Some background: I have set up Airflow on Kubernetes (on AWS). I am able to run DAGs that query a database, send emails or do anything that doesn't require a package that isn't already a part of Airflow. For example, if I try to run a DAG that uses the Facebook-business SDK the DAG will obviously break because the dependency isn't available. I've tried a couple different ways of trying to get this dependency, along with others, installed but haven't been successful.

I have tried to install python packages by modifying my scheduler and webserver deployments to do a pip install of my dependencies as part of an initContainer. When I do this, the DAG remains broken as it is unable to find the needed packages. When I open a shell to my pod I can see that the dependencies have not been installed (I check using pip list). I have also verified that there aren't other python/pip versions installed.

I have also tried to install the dependencies by running a pip install when I open a shell to my pod. This way is successful in installing the dependency in the correct place and also makes it available. However, instead of the webserver UI showing that my DAG is broken, I get the this dag isn't available in the webserver dagbag object message.

I would expect that running pip install as part of my initContainer or container would makes these dependencies available in my pod. However, this isn't the case. It's as if pip install runs without any issues, but by the time my pods are fully set up the python packages are nowhere to be found

I forgot to say that I have found a way to make it work, but it feels somewhat hacky and like there should be a better way - If I open a shell to my webserver container and install the needed dependencies and then open a shell to my scheduler and do the same thing, the dependencies are found and the DAG works.

-- Jesus Garcia
airflow
kubernetes
pip
python

2 Answers

4/11/2019

I would recommend updating your Airflow Docker image to include the libraries you need.

If you plan to use lots of different libraries for specific DAGs then it may be worth create multiple Docker images and then reference them at a task level.

MyOperator(...,
    executor_config={
    "KubernetesExecutor":
        {"image": "myCustomDockerImage"}
    }
)

Reference: baseoperator.py

-- Tomme
Source: StackOverflow

4/11/2019

The init container is a separate docker instance. Unless you rig up some sort of shared storage for your python libraries (which is quite dubious) any pip installs in the init container won't impact the running container of the pod.

I see two options:

1) Modify the docker image that you're using to include the packages you need

2) Prepend a pip install to the command being run in the pod. It's not uncommon to string together a few commands with && between them, in order to execute a sequence of operations in a starting pod.

-- Laizer
Source: StackOverflow