Running Flink Job when Taskmanager killed / loss

10/1/2017

What I want to achieve is Flink cluster that will automatically re-allocate to run the job when there is a resource interruption , eg: Kubernetes pod scale down, loss of existing taskmanager.

I tested with a Flink cluster of :

one Jobmanager, 2 taskmanager (2 task slot each),
Restart Strategies-fixedDelayRestart(2, 2000),
checkpoint and state configured to HDFS.
The job started as 4 parallelism which utilized all the available slots.
This cluster will later be running on top of Kubernetes and manage by autoscaling.

Scenario : When I kill one of the taskmanager, the Flink cluster will run with 1 JM and 1 TM, the Job will then restart, and failed eventually as it will start with previous state (4 parallelism) and complaint unavailable resource from the Flink cluster.

Is there a way for me to restart the job by dynamically re-allocate available resource instead of using previous state?

Appreciate if someone can shade some light on this.

-- coffee_latte1020

apache-flink

kubernetes

scaling

K
Q

Running Flink Job when Taskmanager killed / loss

Similar Questions

0 Answers

KQ

Running Flink Job when Taskmanager killed / loss

Similar Questions

0 Answers

K
Q