Duplicate baseOffset in KAFKA .log file

10/29/2021

I hava a kafka cluster with 3 brokers deploy in k8s cluster. The kafka version is 2.4.1 and Zookeeper version is 3.4.10. There are some problem in this kafka cluster that some topic can not be replicate.

Topic viid_msgHandleVsts_uos

broker-0 (leader) Exception

2021-10-29 09:44:45 ERROR [ReplicaManager broker=0] Error processing fetch with max size 1048576 from replica [1] on partition viid_msgHandleVsts_uos-9: (fetchOffset=17285665, logStartOffset=7644573, maxBytes=1048576, currentLeaderEpoch=Optional.empty) (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.CorruptRecordException: Found record size -1 smaller than minimum record overhead (14) in file /var/lib/kafka/viid_msgHandleVsts_uos-9/00000000000016550653.log.

broker-2 (follower) Exception

2021-10-29 09:44:46 ERROR [ReplicaFetcher replicaId=2, leaderId=0, fetcherId=1] Error for partition viid_msgHandleVsts_uos-9 at offset 17285651 (kafka.server.ReplicaFetcherThread)
org.apache.kafka.common.errors.CorruptRecordException: This message has failed its CRC checksum, exceeds the valid size, has a null key for a compacted topic, or is otherwise corrupt.

This is CorruptRecordException info when i dump index file.

[root@k8s-10 naning]# sh  /opt/kafka_2.12-2.4.1/bin/kafka-run-class.sh -Dlog4j.configuration=file:./log4j.properties  kafka.tools.DumpLogSegments --files  00000000000016550653.index
....................................
....................................
offset: 17285607 position: 619818841
offset: 17285612 position: 619823025
offset: 17285617 position: 619827209
offset: 17285622 position: 619831351
offset: 17285627 position: 619835619
offset: 17285632 position: 619839761
offset: 17285637 position: 619844029
offset: 17285642 position: 619848255
Exception in thread "main" org.apache.kafka.common.errors.CorruptRecordException: Found record size -1 smaller than minimum record overhead (14) in file 00000000000016550653.log.

Then i dump .log file ,i found duplicate baseoffset 17285645 .

619847393 CreateTime: 1635164485785 size: 862 magic: 2 compresscodec: NONE crc: 2631550501 isvalid: true
baseOffset: 17285642 lastOffset: 17285642 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 9 isTransactional: false isControl: false position: 619848255 CreateTime: 1635164485797 size: 820 magic: 2 compresscodec: NONE crc: 2213581215 isvalid: true
baseOffset: 17285643 lastOffset: 17285643 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 9 isTransactional: false isControl: false position: 619849075 CreateTime: 1635164485815 size: 820 magic: 2 compresscodec: NONE crc: 1485663617 isvalid: true
baseOffset: 17285644 lastOffset: 17285644 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 9 isTransactional: false isControl: false position: 619849895 CreateTime: 1635164485814 size: 862 magic: 2 compresscodec: NONE crc: 628712256 isvalid: true
baseOffset: 17285645 lastOffset: 17285645 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 9 isTransactional: false isControl: false position: 619850757 CreateTime: 1635164485814 size: 820 magic: 2 compresscodec: NONE crc: 224181728 isvalid: true
baseOffset: 17285645 lastOffset: 17285645 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 9 isTransactional: false isControl: false position: 619851577 CreateTime: 1635164485796 size: 862 magic: 2 compresscodec: NONE crc: 3517102820 isvalid: true
baseOffset: 17285646 lastOffset: 17285646 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 9 isTransactional: false isControl: false position: 619852439 CreateTime: 1635164485873 size: 862 magic: 2 compresscodec: NONE crc: 1741360490 isvalid: true
baseOffset: 17285647 lastOffset: 17285647 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 9 isTransactional: false isControl: false position: 619853301 CreateTime: 1635164485906 size: 820 magic: 2 compresscodec: NONE crc: 1448756446 isvalid: true
baseOffset: 17285648 lastOffset: 17285648 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 9 isTransactional: false isControl: false position: 619854121 CreateTime: 1635164485913 size: 862 magic: 2 compresscodec: NONE crc: 2944550720 isvalid: true
baseOffset: 17285649 lastOffset: 17285649 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 9 isTransactional: false isControl: false position: 619854983 CreateTime: 1635164485914 size: 820 magic: 2 compresscodec: NONE crc: 663544851 isvalid: true

I wonder what can cause the .log file hava duplicate offset and how can i resolve this problem.

kafka server config

advertised.host.name = null
advertised.listeners = null
advertised.port = null
alter.config.policy.class.name = null
alter.log.dirs.replication.quota.window.num = 11
alter.log.dirs.replication.quota.window.size.seconds = 1
authorizer.class.name = 
auto.create.topics.enable = true
auto.leader.rebalance.enable = true
background.threads = 16
broker.id = 1
broker.id.generation.enable = true
broker.rack = null
client.quota.callback.class = null
compression.type = producer
connection.failed.authentication.delay.ms = 100
connections.max.idle.ms = 600000
connections.max.reauth.ms = 0
control.plane.listener.name = null
controlled.shutdown.enable = true
controlled.shutdown.max.retries = 3
controlled.shutdown.retry.backoff.ms = 5000
controller.socket.timeout.ms = 30000
create.topic.policy.class.name = null
default.replication.factor = 2
delegation.token.expiry.check.interval.ms = 3600000
delegation.token.expiry.time.ms = 86400000
delegation.token.master.key = null
delegation.token.max.lifetime.ms = 604800000
delete.records.purgatory.purge.interval.requests = 1
delete.topic.enable = true
fetch.purgatory.purge.interval.requests = 1000
group.initial.rebalance.delay.ms = 3000
group.max.session.timeout.ms = 300000
group.max.size = 2147483647
group.min.session.timeout.ms = 6000
host.name = 
inter.broker.listener.name = null
inter.broker.protocol.version = 2.4-IV1
kafka.metrics.polling.interval.secs = 10
kafka.metrics.reporters = []
leader.imbalance.check.interval.seconds = 300
leader.imbalance.per.broker.percentage = 10
listener.security.protocol.map = PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
listeners = PLAINTEXT://45.4.34.24:29092
log.cleaner.backoff.ms = 15000
log.cleaner.dedupe.buffer.size = 134217728
log.cleaner.delete.retention.ms = 86400000
log.cleaner.enable = true
log.cleaner.io.buffer.load.factor = 0.9
log.cleaner.io.buffer.size = 524288
log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
log.cleaner.max.compaction.lag.ms = 9223372036854775807
log.cleaner.min.cleanable.ratio = 0.5
log.cleaner.min.compaction.lag.ms = 0
log.cleaner.threads = 1
log.cleanup.policy = [delete]
log.dir = /var/lib/kafka
log.dirs = null
log.flush.interval.messages = 9223372036854775807
log.flush.interval.ms = null
log.flush.offset.checkpoint.interval.ms = 60000
log.flush.scheduler.interval.ms = 9223372036854775807
log.flush.start.offset.checkpoint.interval.ms = 60000
log.index.interval.bytes = 4096
log.index.size.max.bytes = 10485760
log.message.downconversion.enable = true
log.message.format.version = 2.4-IV1
log.message.timestamp.difference.max.ms = 9223372036854775807
log.message.timestamp.type = CreateTime
log.preallocate = false
log.retention.bytes = -1
log.retention.check.interval.ms = 300000
log.retention.hours = 168
log.retention.minutes = null
log.retention.ms = null
log.roll.hours = 168
log.roll.jitter.hours = 0
log.roll.jitter.ms = null
log.roll.ms = null
log.segment.bytes = 1073741824
log.segment.delete.delay.ms = 60000
max.connections = 2147483647
max.connections.per.ip = 10000
max.connections.per.ip.overrides = 10.68.7.1:2147483647
max.incremental.fetch.session.cache.slots = 1000
message.max.bytes = 10485760
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
min.insync.replicas = 1
num.io.threads = 16
num.network.threads = 8
num.partitions = 3
num.recovery.threads.per.data.dir = 1
num.replica.alter.log.dirs.threads = null
num.replica.fetchers = 2
offset.metadata.max.bytes = 4096
offsets.commit.required.acks = -1
offsets.commit.timeout.ms = 5000
offsets.load.buffer.size = 5242880
offsets.retention.check.interval.ms = 600000
offsets.retention.minutes = 10080
offsets.topic.compression.codec = 0
offsets.topic.num.partitions = 50
offsets.topic.replication.factor = 2
offsets.topic.segment.bytes = 104857600
password.encoder.cipher.algorithm = AES/CBC/PKCS5Padding
password.encoder.iterations = 4096
password.encoder.key.length = 128
password.encoder.keyfactory.algorithm = null
password.encoder.old.secret = null
password.encoder.secret = null
port = 9092
principal.builder.class = null
producer.purgatory.purge.interval.requests = 1000
queued.max.request.bytes = -1
queued.max.requests = 500
quota.consumer.default = 9223372036854775807
quota.producer.default = 9223372036854775807
quota.window.num = 11
quota.window.size.seconds = 1
replica.fetch.backoff.ms = 1000
replica.fetch.max.bytes = 1048576
replica.fetch.min.bytes = 1
replica.fetch.response.max.bytes = 10485760
replica.fetch.wait.max.ms = 500
replica.high.watermark.checkpoint.interval.ms = 5000
replica.lag.time.max.ms = 10000
replica.selector.class = null
replica.socket.receive.buffer.bytes = 65536
replica.socket.timeout.ms = 30000
replication.quota.window.num = 11
replication.quota.window.size.seconds = 1
request.timeout.ms = 30000
reserved.broker.max.id = 1000
sasl.client.callback.handler.class = null
sasl.enabled.mechanisms = [GSSAPI]
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.principal.to.local.rules = [DEFAULT]
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism.inter.broker.protocol = GSSAPI
sasl.server.callback.handler.class = null
security.inter.broker.protocol = PLAINTEXT
security.providers = null
socket.receive.buffer.bytes = 102400
socket.request.max.bytes = 104857600
socket.send.buffer.bytes = 102400
ssl.cipher.suites = []
ssl.client.auth = none
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.principal.mapping.rules = DEFAULT
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.abort.timed.out.transaction.cleanup.interval.ms = 60000
transaction.max.timeout.ms = 900000
transaction.remove.expired.transaction.cleanup.interval.ms = 3600000
transaction.state.log.load.buffer.size = 5242880
transaction.state.log.min.isr = 2
transaction.state.log.num.partitions = 50
transaction.state.log.replication.factor = 3
transaction.state.log.segment.bytes = 104857600
transactional.id.expiration.ms = 604800000
unclean.leader.election.enable = true
zookeeper.connect = 45.4.34.24:22181,45.4.34.25:22181,45.4.34.26:22181
zookeeper.connection.timeout.ms = 240000
zookeeper.max.in.flight.requests = 10
zookeeper.session.timeout.ms = 6000
zookeeper.set.acl = false
zookeeper.sync.time.ms = 2000
-- nobb
apache-kafka
kubernetes

0 Answers