Why headless service to be used for Kafka in Kubernetes, why not Cluster IP with load balancing out of box?

1/17/2020

Most of the examples I come across to use Kafka in Kubernetes is to deploy it as a headless service but I am not able to get the answer yet on why it should be headless and not Cluster IP? In my opinion cluster, IP provides the load balancing in which we ensure out of the box that not only one of the broker gets loaded always with its resources as I see with headless the Kafka clients be it sarma or java client tries to pick always the first IP from the DNS lookup and connects to it, will this not be a bottleneck if there are around 100+ clients trying to do the same and open connection to the first IP? or Kafka handles this inbuilt already which I am still trying to understand how it really happens.

-- Melwyn Jensen
apache-kafka
kubernetes
load
load-balancing

1 Answer

1/17/2020

When there is no differentiation between various instances of a services(replicas of a pod serving a stateless application), you can expose them under a ClusterIP service as connecting to any of the replica to serve the current request is okay. This is not the case with stateful services(like Kafka, databases etc). Each instance is responsible for it's own data. Each instance might be owning a different partition/topic etc. The instances of the service are not exact "replicas". Solutions for running such stateful services on Kubernetes usually use headless services and/or statefulsets so that each instance of the service has a unique identity. Such stateful applications usually have their own clustering technology that rely on each instance in the cluster having a unique identity.

Now that you know why stable identities are required for stateful applications and how statefulsets with headless services provide stable identities, you can check how your Kafka distributions might using them to run Kafka on kubernetes.

This blog post explains how strimzi does it:

For StatefulSets – which Strimzi is using to run the Kafka brokers – you can use the Kubernetes headless service to give each of the pods a stable DNS name. Strimzi is using these DNS names as the advertised addresses for the Kafka brokers. So with Strimzi:

  • The initial connection is done using a regular Kubernetes service to get the metadata.

  • The subsequent connections are opened using the DNS names given to the pods by another headless Kubernetes service.

-- Shashank V
Source: StackOverflow