Kafka/Kubernetes and Autoscale

11/6/2019

I have a question about Kafka in Kubernetes, specially autoscaling...

Let say I have 3 Kafka Brokers in 3 Pods in Kubernetes and there is a TopicA with 5 partitions (P1, P2, P3, P4, P5) and replication factor is 3 and all Brokers have their Persistent Volumes and I have auto scaling in Kubernetes configured so if it detects, lets say %80 CPU/Memory usage in Kafka Pods it will starts additional Pods for Kafka Brokers...

If I am not completely wrong, Kafka will detect over Zookeeper extra instances and can shift Partitions (so lets say P1, P2 were at Broker1 and P3, P4 were at Broker2 and P5 was at Broker3) so a new Pod comes the picture I will expect would be something like following P1 at Broker1, P3, P4 Broker2, P5 Broker3 and P2 at Broker4.

So my first question is, are the above assumptions correct and Kafka behave like this or not?

Second question is about down scaling, lets load peak is gone and we don't need Pod4, can Kubernetes shotdown the Pod and Kafka can return to the 3 Brokers configuration, that is the part I am not sure of. While I have replication factor 3, 2 other brokers should be able to continue to work, kann Kafka pull Partition P2 to Broker 1 or 2 or 3?

And the final question would be, if the Kubernetes spawned Pod5,6,7 can we downscale to 3 Pods configuration again?

Thx for answers..

-- posthumecaver
apache-kafka
kubernetes

1 Answer

11/6/2019

Kafka will detect over Zookeeper extra instances and can shift Partitions

Partitions will not be rebalanced when expanding a cluster.

In the case of downscale, partitions must be moved off the brokers before they can be removed from the cluster, otherwise you'll have permanently offline partitions that cannot replicate. And you need to be conscious of disk utilization when shrinking a cluster as a partition is limited in size by the smallest data directory

Kubernetes itself won't help Kafka perform these operations and in non-k8s environments, this process is mostly manual but can be scripted (see kafka-kit by Datadog). I believe that the k8s operators such as Strimzi operator could make data rebalances easier when scaling, however at the moment, it doesn't support automatic reassignment

-- OneCricketeer
Source: StackOverflow