How to take a dump of data from hosted kafka cluster and import the data into local kafka cluster?

1/6/2018

I have a Kafka cluster in kubernetes with a lot of test data. I want to have some/all of that test data imported into my local Kafka cluster. This, way it would be easier for me to perform tests in the local environment with actual data from kubernetes.

So, is there a way to dump for eg: 5000 messages from a kafka topic into a file and restore them into a local kafka topic ?

-- oblivion
apache-kafka
kubernetes

2 Answers

1/8/2018

The way we do it (not on Kubernetes but it does not matter in this case) is:

  1. if we need to duplicate some part of data from our production cluster to a local/test cluster - we start a Flume agent that reads from prod Kafka cluster and pushes into test cluster. This only works with either live data (when you start your copy process now and let it run for whatever time is needed, capturing the live traffic), or if it is Ok to start getting data from the EARLIEST offset - because vanilla Flume per se does not allow you to specify specific range of offsets to consume from a topic (AFAIK)
  2. if we do need data from a very specific range of offsets - we just run a very simple Java client (our own custom one, just a few lines of code) that seeks to the beginning offset and reads until the specified end offset of the source cluster/topic - and sends the events into the target Kafka cluster/topic

we found these approaches simpler and more flexible that using more complex tools/frameworks like MirrorMaker.

-- Marina
Source: StackOverflow

1/8/2018
  1. Replicator is a commercial tool that enables you to replicate topics from one cluster to another. Similar to MirrorMaker though, it's designed to replicate entire topics, not just part of them.

  2. You can use kafkacat with stdin/stdout if you just want some kind of hacky option, but things like partitioning, topic config and all that stuff that you'd want to match for accurate testing you would have to ensure gets done properly.

-- Robin Moffatt
Source: StackOverflow