Trying to move from Flink 1.3.2 to 1.5 We have cluster deployed with kubernetes. Everything works fine with 1.3.2 but I can not submit job with 1.5. When I am trying to do that I just see spinner spin around infinitely, same via REST api. I even can't submit wordcount example job. Seems my taskmanagers can not connect to jobmanager, I can see them in flink UI, but in logs I see
level=WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with org.apache.flink.shaded.akka.org.jboss.netty.channel.ConnectTimeoutException: connection timed out: flink-jobmanager-nonprod-2.rpds.svc.cluster.local/25.0.84.226:6123
level=WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@flink-jobmanager-nonprod-2.rpds.svc.cluster.local:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager-nonprod-2.rpds.svc.cluster.local:6123]] Caused by: [No response from remote for outbound association. Associate timed out after [20000 ms].]
level=WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with org.apache.flink.shaded.akka.org.jboss.netty.channel.ConnectTimeoutException: connection timed out: flink-jobmanager-nonprod-2.rpds.svc.cluster.local/25.0.84.226:6123
But I can do telnet from taskmanager to jobmanager
Moreover everything works on my local if I start flink in cluster mode (jobmanager + taskmanager). In 1.5 documentation I found mode option which flip mode between flip6 and legacy (default flip6), but If I set mode: legacy I don't see my taskmanagers registered at all.
Is this something specific about k8s deployment and 1.5 I need to do? I checked 1.5 k8s config and it looks pretty same as we have, but we using customized docker image for flink (Security, HA, checkpointing)
Thank you.
The issue with jobmanage connectivity. Jobmanager docker image cannot connect to "flink-jobmanager" (${JOB_MANAGER_RPC_ADDRESS}) address.
Just use afilichkin/flink-k8s Docker instead of flink:latest
I've fixed it by adding new host to jobmanager docker. You can see it in my github project
https://github.com/Aleksandr-Filichkin/flink-k8s/tree/master