Why does Flink use Yarn?

3/1/2018

I am taking a deep look inside Flink to see how I can use it on a project and had a question for the creators / high level thinkers... why does Flink use Yarn as the default resource manager?

Was Kubernetes considered? Or is it one of those things where we started on Yarn, it works pretty well...

I have come across many projects and articles that allow Kubernetes and Yarn to work together in cluding the Myraid project that allows yarn to be deployed with Mesos (but I am on Kubernetes...)

I have a very large compute cluster 2000 or so nodes that I use and I want to use the super cool CEP features of Flink feeding off a Kafka infrastructure (also deployed on to this kubernetes environment).

I am looking to understand the reasons behind using Yarn as the resource manager underneath Flink and if would be possible (with some effort and contribution to the project) to make Kubernetes an option alongside Yarn.

Please note - I am new to Yarn - just reading up about it. Also new to Flink and learning about the deployment and scale-out architecture.

-- Anu
apache-flink
kubernetes
yarn

1 Answer

3/1/2018

Flink is not tied to YARN. It can also run on Apache Mesos and there are also users running it on Kubernetes. In the current version (Flink 1.4.1), there are a few things to consider when running Flink in Kubernetes (see this talk by Patrick Lucas).

The Flink community is also currently working on improving Flink's support for container setups. The effort is called FLIP-6 and will be included in the next release (Flink 1.5.0).

-- Fabian Hueske
Source: StackOverflow