Reaching out regarding an issue we are facing with Airflow and Spark.
Setup : We are currently using Apache Airflow (v2.0.1) for monitoring and scheduling workflows for one of our projects. We have created a DAG using spark submit operator. (Spark v3.0.0)
Airflow > SparkSubmitOperator in cluster mode with Kubernetes as spark master K8s:// > Dynamic Allocation and Pod management of spark driver and executor pods on Kubernetes
Issue : While triggering the DAG from airflow UI we are facing an issue that it is getting stuck at some task randomly, it keeps showing the task as running in the UI while it gets completed in the driver pod. We have individually tested each of its task, they are executing successfully.
Below is the DAg diagram for your reference.
We get the below logs repeatedly for the task on airflow UI.
Attempt to solve this issue: Add spark.stop() and sys.exit(0) in python codes to return proper exit status. (but no luck – airflow gets stuck on a task randomly) .
I have been working on this issue from last few days but couldn’t resolve it, any leads/direction here would be helpful.