The challenge is to create a Kafka producer that connects to a Kafka cluster that lives within a Kubernetes cluster from outside that cluster. We have several RDBMS databases that sit on premise and we want to stream data directly to Kafka that lives in Kubernetes on AWS. We have tried a few things and deployed the Confluent Open Source Platform but nothing worked so far. Does anyone have a clear answer to this problem?
Kafka clients need to connect to specific node to produce or consume messages.
The kafka protocol can connect to any node to get metadata. Then the client connects to a specific node which has been elected as leader of the partition which the client wants to produce/consume.
Each kafka pod has to be individually accessible, so you need a L4 load balancer per pod. The advertised listener config can be set in the kafka config to advertise different IP/hostname for internal and external clients. Configure the ADVERTISED_LISTENERS EXTERNAL to use the load balancer, and the INTERNAL to use pod IP. The ports has to be different for internal and external.
Checkout https://strimzi.io/, https://bitnami.com/stack/kafka, https://github.com/confluentinc/cp-helm-charts
Update:
Was trying out installing kafka in k8s running in AWS EC2. Between confluent-operator, bitnami-kafka and strimzi, only strimzi configured EXTERNAL in the kafka settings to the load balancer.
bitnami-kafka used headless service, which is not useful outside the k8s network. Confluent-operator configures to node's IP which makes it accessible outside k8s, but to those who can reach the EC2 instance via private IP.
You might have a look at deploying Kafka Connect inside of Kubernetes. Since you want to replicate data from various RDMBS databases, you need to setup source connectors,
A source connector ingests entire databases and streams table updates to Kafka topics. It can also collect metrics from all of your application servers into Kafka topics, making the data available for stream processing with low latency.
Depending on your source databases, you'd have to configure the corresponding connectors.
If you are not familiar with Kafka Connect, this article might be quite helpful as it explains the key concepts.