We're running Airflow 1.10.12, with KubernetesExecutor and KubernetesPodOperator. In the past few days, we’re seeing tasks getting stuck in queued state for a long time (to be honest, unless we restart the scheduler, it will remain stuck in that state), new tasks of the same DAG are getting scheduled properly.
The only thing that helps is either clearing it manually, or restarting the scheduler service
We usually see it happen when we run our E2E tests, which spawns ~20 DAG runs for everyone of our 3 DAGs, due to limited parallelism, some will be queued (which is fine by us)
These are our parallelism params in airflow.cfg
parallelism = 32
dag_concurrency = 16
max_active_runs_per_dag = 16
2 of our DAGs, overwrite the max_active_runs
and set it to 10
Any idea what could be causing it?