Apache Flink deployment in Kubernetes - availability & scalability

2/13/2020

I use Kubernetes (Openshift) to deploy many microservices. I wish to utilise the same to deploy some of my Flink jobs. Flink jobs are critical - some jobs are stateless that process every data (exactly once), some jobs are stateful that looks for patterns in the stream or react to time. No jobs can tolerate long downtime or frequent shutdown (due to programming errors, the way Flink quits).

I find docs mostly lean to deploy Flink jobs in k8s as Job Cluster. But how should one take a practical approach in doing it?

  • Though k8s can restart the failed Flink pod, how can Flink restore its state to recover?
  • Can the Flink pod be replicated more than one? How do the JobManager & TaskManager works when two or more pods exists? If not why? Other approaches?
-- Vijay Veeraraghavan
apache-flink
flink-streaming
kubernetes

1 Answer

2/13/2020

Though k8s can restart the failed Flink pod, how can Flink restore its state to recover?

From Flink Documentation we have:

Checkpoints allow Flink to recover state and positions in the streams to give the application the same semantics as a failure-free execution.

It means that you need to have a Check Storage mounted in your pods to be able to recover the state.

In Kubernetes you could use Persistent Volumes to share the data across your pods.

Actually there are a lot of supported plugins, see here.

You can have more replicas of TaskManager, but in Kubernetes you don't need to take care of HA for JobManager since you can use Kubernetes self-healing deployment.

To use self-healing deployment in Kubernetes you just need to create a deployment and set the replica to 1, like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - name: http
          containerPort: 80
        imagePullPolicy: IfNotPresent

Finally, you can check this links to help you setup Flink in Kubernetes:

running-apache-flink-on-kubernetes

Flink Job cluster on Kubernetes

Flink Kubernetes Deployments

Running Flink on Kubernetes with KUDO

-- KoopaKiller
Source: StackOverflow