Kafka & Zookeper deployment in production in K8s

7/24/2017

Does anyone have any suggestions for Kafka and ZK configuration for Production Environment?

I had a look online and there are some links reporting configuration in terms of compression, RAM.. etc, but nothing related the number of ZK and Kafka instances...

My deployments has 5 zoo and 3 kafka instances:

NAME      READY     STATUS              RESTARTS   AGE
kafka-0   0/1       Running             0          12s
kafka-1   0/1       Running             0          12s
kafka-2   0/1       Running             0          12s
zoo-0     0/1       Running             0          12s
zoo-1     0/1       Running             0          12s
zoo-2     0/1       Running             0          12s
zoo-3     0/1       Running             0          12s
zoo-4     0/1       Running             0          12s   

What I got is that is preferable to deploy a Zk cluster (alone) and then point to that one. What about cluster communication on Kubernetes?

Any help or advice is appreciated - Thanks

-- Prisco
apache-kafka
configuration
kubernetes
production-environment
zk

1 Answer

7/26/2017

I am not an expert at this, but will take a first stab.

One thing which I did not understand is Why are you having more zookeepers than kafka -> Zookeeper is metadata store for kafka, so you can start with one zookeeper, to handle failover you can go with 3 zookeepers.

So to have a simple Kafka Production cluster, you can start with 3 / 5 Kafka nodes and 3 zookeepers.

Kafka disk size should be dependent on retention you want to have. Number of nodes should be dependent on parallelization you want to have.

I am not aware of Kubernetes, so not sure . But zookeeper in general should have separate root folder for each of the infrastructure and 3 zookeepers should be good to start. (Note : you might want to try with SSD for zookeeper disk, some say it is good, some say no improvement, I recommend to try and verify)

Going to production, I would be more concerned around monitoring and making sure that service doesn't go down. You can ensure that by doing the following

  1. Kafka is write ahead log, so make sure your rendition.bytes and retention.ms are set appropriately at both cluster level and topic level
  2. Monitor for any lag, Consumer lag specifically
  3. Leader distribution is even, you have to monitor it in case of any node restart the distribution might go uneven.
  4. If you are handling lot of data then do worry about compression.
-- supermonk
Source: StackOverflow