Can flink run mutliple same jobs to achieve pseudo dynamic scaling?

12/25/2018

We are working on how to do dynamic scaling of flink tasks. The task is about to read streaming in kafka topic and do ... then sink to another kafka topic. We know that the flink job must be stopped first to modify the parallelism, which is not what we want.

Since we cant dynamic add resource to tasks without stopping flink jobs, can we duplicate the flink jobs (which consumes through same groupid from the kafka topic) to increase the performance? Besides, is it possible to use yarn or kubernetes to manage those jobs and achieve a pseudo-dynamic scaling for such a flink task(with kafka)?

-- snakie yu
apache-flink
kubernetes
yarn

1 Answer

12/29/2018

Is there a reason why you don't want to modify parallelism by stopping the job?

You could do this however you would effectively be splitting your data across the various jobs. So not only would you be incurring the cost of now needing to understand your throughput across multiple jobs to efficiently autoscale but you would make it such that any stateful processing that is done would result in incorrect/inconsistent results.

-- Luka Jurukovski
Source: StackOverflow