I am running Kafka on Kubernetes using the Kafka Strimzi operator. I am using incremental sticky rebalance strategy by configuring my consumers with the following:
ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
org.apache.kafka.clients.consumer.CooperativeStickyAssignor.class.getName()
Each time I scale consumers in my consumer group all existing consumer in the group generate the following exception
Exception in thread "main" org.apache.kafka.common.errors.RebalanceInProgressException: Offset commit cannot be completed since the consumer is undergoing a rebalance for auto partition assignment. You can try completing the rebalance by calling poll() and then retry the operation
Any idea on what caused this exception and/or how to resolve it?
Thank you.
The consumer rebalance happens whenever there is a change in the metadata information of a consumer group.
Adding more consumers (scaling in your words) in a group is one such change and triggers a rebalance. During this change, each consumer will be re-assigned partitions and therefore will not know which offsets to commit until the re-assignment is complete. Now, the StickyAssignor
does try and ensure that the previous assignment gets preserved as much as possible but the rebalance will still be triggered and even distribution of partitions will take precedence over retaining previous assignment. (Reference - Kafka Documentation)
Rest, the exception's message is self-explanatory that while the rebalance is happening some of the operations are prohibited.
How to avoid such situations?
This is a tricky one because Kafka needs rebalancing to be able to work effectively. There are a few practices you could use to avoid unnecessary impact:
max.poll.interval.ms
- so the possibility of experiencing these exceptions is reduced.max.poll.records
or max.partition.fetch.bytes
group.initial.rebalance.delay.ms
for empty consumer groups (either for the first time deployment or destroyin everything and redeploying again)These techniques can only help you reduce the unnecessary behaviour or exception but will NOT prevent rebalance completely.