Should we run a Kafka node with 3 replicas or 3 Kafka nodes with 1 replica?

4/28/2020

I don't understand a difference between running 1 Kafka node with 3 replicas and 3 Kafka nodes each with 1 replica.

We maintain our own Kubernetes cluster where we want to run a Kafka cluster. We're using a Bitnami Helm chart.

We can set:

  1. ...3 different Kafka services with 1 replica and each has its own URL (e.g. localhost:9092, localhost:9093 and localhost:9094).
  2. ...1 Kafka service running in 3 replicas (there are only 1 URL localhost:9092 for all replicas).

Is there any difference in a way of synchronization and what is a better way for configuration?

-- Prokop Simek
apache-kafka
bitnami
devops
kubernetes
kubernetes-helm

2 Answers

4/28/2020

1 Kafka node with 3 replicas are in the same machine. The data will be stored in the same server. The replication on the same Kafka server is to create a security to avoid data to be corrupted.

About 3 kafka with 1 replica is another approach. For example, if one of your servers took down, another Kafka can assume the leader position for specific topic if all data replicated from the same topic. This is one of the beauties of Kafka. If you configure in the right way, Zookeeper can do the replace and your service will not crash.

One of the best practices that you can do in production is create two zookeepers (leader electors) an put 3 or 4 kafkas in different machines and, each kafka, with 3 in replica factor. This will create a strong consistency in your data and, same if one or two servers down, your kafka will run in a certain safe way.

This happened to me. 4 kafkas, 2 down and everything still running perfectly. Also, some details need to be put on configurations. Suggest you to see about the Stephane Maarek on YT.

-- William Prigol Lopes
Source: StackOverflow

4/28/2020

In order to provide high availability and utilize Kafka's parallelism for multiple consumers, you should scale up and I would recommend 3 servers.
Multiple brokers setup will spread messages/partitions across different brokers for the same topic, so a consumer group can receive messages from different brokers with high parallelism.

Furthermore, notice that replications helps you only for high availability, so partition replica/s will take the lead in case of a server failure.
For 3 nodes cluster I would recomnend to start with 2 replicas, so one server failure will cause no message loss at all; if availability is very important to you, or you don't trust your hardware, go with 3 replicas, this way you can survive two shutdowns at the same time, but compromise more disk space on servers.

-- Ofek Hod
Source: StackOverflow