Kubernetes Orchestration depending upon number of rows/records/Input Files

3/11/2020

Requirement is to orchestrate ETL containers depending upon the number of records present at the Source system (SQL/Google Analytics/SAAS/CSV files).

To explain take a Use Case:- ETL Job has to process 50K records present in SQL server, however, it takes good processing time to execute this job by one server/node as this server makes a connection with SQL, fetches the data and process the records.

Now the problem is how to orchestrate in Kubernetes this ETL Job so that it scales up/down the containers depending upon number of records/Input. Like the case discussed above if there are 50K records to process in parallel then it should scale up the containers process the records and scales down.

-- Tarun Narang
kubernetes

1 Answer

3/11/2020

You would generally use a queue of some kind and Horizontal Pod Autoscaler (HPA) to watch the queue size and adjust the queue consumer replicas automatically. Specifics depend on the exact tools you use.

-- coderanger
Source: StackOverflow