I am currently running Kafka and Zookeeper in Kubernetes cluster. I am made one headless service, statefulset, and a node balancer to expose the Kafka cluster outside of the Kuberentes.
The problem is that it seems like not always each broker(pod) is not connected to the topic properly. For instance, if I test with Kafkacat consumer mode with topic call test which has three partitions, sometimes some partitions do not reach the end of the topic (or it does not connect to every partition) so it cannot produce and consume some of the messages. But this happens very randomly, sometimes it works as fine. (I need to give some amount of time to work or I need to restart until it works properly, but then in certain times it fails back to work.)
some of the error messages that I often get:
Error when sending message to topic XXX with key: null, value: X bytes with error:
WARN Got error produce response with correlation id 6 on topic-partition test-1, retrying (2 attempts left). Error: NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
I am currently using https://github.com/kubernetes-retired/contrib/tree/master/statefulsets/kafka as my source code.
Could someone give me hint on where this randomness coming from or some other testing methods that I can try to investigate some factors? Hopefully, my question was clear enough.