Is it feasible to use the Vertical Pod Autoscaler with Airflow on a task level?

12/7/2020

I currently use Airflow (via Cloud Composer) with the Celery Executor and the KubernetesPodOperator.

One challenge I have is to use resources efficiently when some Airflow tasks use relatively little memory and others use many GB of memory.

It would be great if Airflow tasks could be used in conjunction with the Kubernetes Vertical Pod Autoscaler. The idea being that over time Kubernetes will see how much resources a task tends to use and scale it accordingly. In my case it would be the Google Vertical Pod Autoscaler but the differences don't seem too important here. Of course, I would start with the VPA turned off and just look at the recommendations.

Since my Airflow tasks each generally process the same amount of data from day to day it seems doable. In the case that one day a task has a lot more data (and hence needs more resources) my hope would be that either: 1. The pod will eke by and then get larger the next time it runs. 2. The pod will be evicted and when airflow retries the task the next pod will use more resources.

I've read that when using Celery in general one challenge of scaling is that the "worker" is a pod and it is not easy to know how many resources to give to a worker because any task can be assigned to that worker. However, when using the KubernetesPodOperator this doesn't seem like a problem.

There is also the factor of possibly using the Kubernetes Executor but since that's not supported in Cloud Composer yet so we can leave that factor out.

I did search for tutorials or others sharing their experience but didn't have luck finding anyone using VPA with Airflow. If you know of any resources please do share them.

In summary: Does this sound reasonable/feasible? Or are there any other factors that could create issues here?

Edit:

I have documented the problem more clearly in this repo: https://github.com/RayBB/airflow-vpa

-- RayB
airflow
google-cloud-composer
google-kubernetes-engine
kubernetes

0 Answers