I am developing an ETL batch application using spring batch. My ETL process takes data from one pagination based REST API and loads it to the Google Big-query. I would like to deploy this batch application in kubernetes cluster and want to exploit pod scalability feature. I understand spring batch supports both horizontal and vertical scaling. I have few questions:-
1) How to deploy this ETL app on kubernetes so that it creates pod on demand using remote chunking / remote partitioning?
2) I am assuming there would be main master pod and different slave pods provisioned based on load. Is it correct?
3) There is one kubernetes batch API also available. Use kubernetes batch API or use Spring Cloud feature.Whis option is the better one?
I have used Spring Boot with Spring Batch and Spring Cloud Task to do something similar to what you want to do. Maybe it will help you.
The way it works is like this: I have a manager app that deploys pods on Kubernetes with my master application. The master application does some work and then starts the remote partitioning deploying several other pods with "workers".
Trying to answer your questions:
1) You can create a docker image of an application that has a Spring Batch job. Let's call it Master application. The application that will deploy the master application could uses a TaskLauncher or an AppDeployer from spring cloud deployer kubernetes
2) Correct. In this case you could use remote partitioning. Each partition would be another docker image with a Job. This would be your worker. An example of remote partitioning can be found here.
3) In my case I used spring batch and manage to do everything I needed. The only problems I have now is with Upscalling and Downscaling my cluster. Since my workers are not stateful I'm experiencing some problems when instances are removed from the cluster. If you don't need to upscale or downscale your cluster, you are good to go.