Does anyone have any suggestions for Kafka and ZK configuration for Production Environment?
I had a look online and there are some links reporting configuration in terms of compression, RAM.. etc, but nothing related the number of ZK and Kafka instances...
My deployments has 5 zoo and 3 kafka instances:
NAME READY STATUS RESTARTS AGE
kafka-0 0/1 Running 0 12s
kafka-1 0/1 Running 0 12s
kafka-2 0/1 Running 0 12s
zoo-0 0/1 Running 0 12s
zoo-1 0/1 Running 0 12s
zoo-2 0/1 Running 0 12s
zoo-3 0/1 Running 0 12s
zoo-4 0/1 Running 0 12s
What I got is that is preferable to deploy a Zk cluster (alone) and then point to that one. What about cluster communication on Kubernetes?
Any help or advice is appreciated - Thanks
I am not an expert at this, but will take a first stab.
One thing which I did not understand is Why are you having more zookeepers than kafka -> Zookeeper is metadata store for kafka, so you can start with one zookeeper, to handle failover you can go with 3 zookeepers.
So to have a simple Kafka Production cluster, you can start with 3 / 5 Kafka nodes and 3 zookeepers.
Kafka disk size should be dependent on retention you want to have. Number of nodes should be dependent on parallelization you want to have.
I am not aware of Kubernetes, so not sure . But zookeeper in general should have separate root folder for each of the infrastructure and 3 zookeepers should be good to start. (Note : you might want to try with SSD for zookeeper disk, some say it is good, some say no improvement, I recommend to try and verify)
Going to production, I would be more concerned around monitoring and making sure that service doesn't go down. You can ensure that by doing the following