I would like to generate a data pipeline and I would like to use Airflow.
Some information about the project:
All tasks (input parameters) are stored in a database in advance by means of a script.
I want to start the pipeline manually. Then all tasks are retrieved from the database and stored in a queue. Each task consists of 3 dependent tasks, which should be processed step by step. For each subtask, a container is generated in advance with information about the number of CPUs and memory required. Depending on the size of the cluster, the individual tasks are then processed in parallel. Scaling should be organized automatically depending on the resources of the cluster.
Can this concept be implemented as described or is there a simpler solution?
Thx Markus