Flink Statefun HA kubernetes cluster

5/21/2020

I'm trying to deploy high available flink cluster on kubernetes. In the below examples worker nodes are replicated but we have only one master pod.

https://github.com/apache/flink-statefun

As far as I understand there are 2 approaches to make job manager HA.

  1. https://ci.apache.org/projects/flink/flink-docs-stable/ops/jobmanager_high_availability.html
  2. https://medium.com/hepsiburadatech/high-available-flink-cluster-on-kubernetes-setup-73b2baf9200e

In the first example we deploy another job manager to switch between them in case of failure In the second example kubernetes redeploy the job manager pod in case of failure

So I have few questions

  • For both examples what happens to the running jobs when the active job manager fails?
  • Can the first scenario be applied on kubernetes?
  • For the second scenario in case of job manager failure flink UI will be unavailable until the pod recover but in the second first scenario it will be available am I right?

  • What is the pros/cons of the both scenarios?

-- Arif Ezberci
apache-flink
flink-statefun
high-availability
kubernetes

1 Answer

5/21/2020

There is one approach to make job manager HA, both of your link is using the JM HA using zookeeper cluster to make active/standby arhitecture of the JM.

  1. When JobManager fails there is a "Failover" such as describe in apache flink documentation(first link), the standby JM become to be Active.
  2. Ofcouse, kubernetes is just the deployment of the whole Flink cluster, you can still use the HA cluster mode using zk.
  3. No, both will make the "failover" and a standby JM will become active.

You are not understand that kubernetes is only the deploy cluster of flink, Same as you can deploy it on phsical/virtual servers, than u can deploy it on kubernetes, but things like High Aviability will stay the same.

EDIT: You can make 2 or more pods in kubernetes of JobManager and then it`ll be equal to the first solution.

-- ShemTov
Source: StackOverflow