Can't submit job with Flink 1.5 cluster

6/11/2018

Trying to move from Flink 1.3.2 to 1.5 We have cluster deployed with kubernetes. Everything works fine with 1.3.2 but I can not submit job with 1.5. When I am trying to do that I just see spinner spin around infinitely, same via REST api. I even can't submit wordcount example job. Seems my taskmanagers can not connect to jobmanager, I can see them in flink UI, but in logs I see

level=WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with org.apache.flink.shaded.akka.org.jboss.netty.channel.ConnectTimeoutException: connection timed out: flink-jobmanager-nonprod-2.rpds.svc.cluster.local/25.0.84.226:6123

level=WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@flink-jobmanager-nonprod-2.rpds.svc.cluster.local:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager-nonprod-2.rpds.svc.cluster.local:6123]] Caused by: [No response from remote for outbound association. Associate timed out after [20000 ms].]

level=WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with org.apache.flink.shaded.akka.org.jboss.netty.channel.ConnectTimeoutException: connection timed out: flink-jobmanager-nonprod-2.rpds.svc.cluster.local/25.0.84.226:6123

But I can do telnet from taskmanager to jobmanager

Moreover everything works on my local if I start flink in cluster mode (jobmanager + taskmanager). In 1.5 documentation I found mode option which flip mode between flip6 and legacy (default flip6), but If I set mode: legacy I don't see my taskmanagers registered at all.

Is this something specific about k8s deployment and 1.5 I need to do? I checked 1.5 k8s config and it looks pretty same as we have, but we using customized docker image for flink (Security, HA, checkpointing)

Thank you.

-- Georgy Gobozov
akka
apache-flink
jobs
kubernetes
scala

1 Answer

7/25/2018

The issue with jobmanage connectivity. Jobmanager docker image cannot connect to "flink-jobmanager" (${JOB_MANAGER_RPC_ADDRESS}) address.

Just use afilichkin/flink-k8s Docker instead of flink:latest

I've fixed it by adding new host to jobmanager docker. You can see it in my github project

https://github.com/Aleksandr-Filichkin/flink-k8s/tree/master

-- Aleksandr Filichkin
Source: StackOverflow