Kafka consumer group offset goes down to -1

7/10/2018

We run a Kafka cluster in Kubernetes based on the gcr.io/google_containers/kubernetes-kafka:1.0-10.2.1 docker image with the zookeeper backend using gcr.io/google_containers/kubernetes-zookeeper:1.0-3.4.10 with three instances of both kafka and zookeeper.

We have a few different consumer groups that both consume and produces data on three different topics.

Behaviour: Sometimes a consumer group will set their offset for a topic on a partition to -1 and from then on stop consuming on that topic all together. If we restart our consumers we might see them setting their offset to the latest offset, which might mean that the consumer has missed messages in the time between it going to -1 and being restarted.

I'm having issues finding why a consumer group would ever set its offset to -1 and why it would do so "randomly" after days of uptime. Is there any logical explanation to why Kafka would set this offset for a certain consumer? Cannot see anything in our actual consumers that indicates that they explicitly are doing this.

We are currently having consumers both running in golang and in Node.js, where all are facing this issue, so our current assumption is that this issue does not have to do with our consumers, but rather with our Kafka setup.

-- poppe
apache-kafka
docker
kafka-consumer-api
kubernetes

1 Answer

5/21/2019

The default offset retention policy offsets.retention.minutes used to be 1 day and in older Kafka versions the offset got wiped out even for active consumers. Fixed with KIP-211

We originally discovered this with Kafka 0.10.2.1, a few select topics lost the consumer group offsets (i.e., turned to -1) because no messages arrived on the topic for a couple of days and the offset retention policy kicked in and wiped out offsets for active consumers.

We were able to workaround this by increasing the retention setting to 7 days which seems to be what Kafka also ended up doing, see KIP-186

-- Drupad Panchal
Source: StackOverflow