java kafka-client can't reconnect after broker restart

10/2/2018

I'm having the same problem with kafka-streams and spring-kafka applications. The first one is using kafka-clients:1.0.0 library while another one version 1.0.2

There is just one broker instance running in kubernetes (KAFKA_ADVERTISED_LISTENERS="PLAINTEXT://${POD_IP}:9092"). It's stateful set and it's accessed from the client app via headless service internal endpoint (although I've tried cluster ip and the issue is the same)

Once I delete this kafka pod and it's recreated, my client application can't reconnect. Pod is indeed recreated with another ip address but since I'm accessing it via service internal endpoint I'm expecting my client app to resolve this new address but it's not happening.

The kafka-clients library is logging "Found least loaded node [old_ip]:9092 (id: 0 rack: null)" while there is nothing anymore running on this address

JVM TTL cache is not a problem as I've set it to periodically refresh.

Restarting client application solves the problem

If providing {POD_IP} in KAFKA_ADVERTISED_LISTENERS causes this problem, would providing a pod's hostname solve this problem? Or is there a way to direct my client to resolve this new address?

-- sowieso-fruehling
apache-kafka
kubernetes

1 Answer

5/7/2019

It seems that it has something to do with KAFKA-7755. Updating client version to 2.2.0 / 2.1.1 should help.

-- Ivan Ponomarev
Source: StackOverflow