Kubernetes HorizontalPodAutoscaler for pods spawn by Airflow DAGs that uses KubernetesPodOperator

3/9/2020

I have airflow using CeleryExecutor deployed in k8s cluster using helm. I have number of workers defined. I originally thought that this was the max number of workers but looks like the worker pods are always up. So this becomes a static number of workers. To resolve this static set of workers, I am looking for making it dynamic using HorizontalPodAutoScaler. Well, I am using KubernetesPodOperator which means the workers is not doing the work/task itself but will spin a new resource (Pod) for each task. So this becomes an issue for me implementing HorizontalPodAutoScaler because I need to scale the number of workers but since it's not the workers doing the work then it will not be able to measure the CPU/Memory usage correctly to scale up/down. I think HorizontalPodAutoScaler can be used for airflow running KubernetesExecutor but not sure about CeleryExecutor. Basically what I need is to implement HorizontalPodAutoScaler based on the metrics of the pods that the KubernetesPodOperator will create and then increase/decrease the number of workers.

So how do I use HorizontalPodAutoScaler or if there is other alternative to make the worker size dynamic?

-- alltej
airflow
horizontalpodautoscaler
kubernetes
kubernetes-helm

1 Answer

3/11/2020

if you are using KubernetesExecutor than no need to add HorizontalPodAutoScaler ,because it does scale pods based on matrix, technically KubernetesExecutor create new pod every time for DAG that going to be execute ,yes you can define resource in DAG for pod. If airflow deployed on kube then celeryExecutor not good idea.

-- ANISH KUMAR MOURYA
Source: StackOverflow