Data resistance on Kafka topic on GCP pod failure

8/27/2021

I am planning to run Kafka on GCP (google cloud platform).

What I wonder is what happens to a data in Kafka topic when a GCP pod fails? By default a new pod will be created, but will the data in Kafka topic be lost? How can I avoid data loss in this situation?

I appreciate any help. Thanks in advance :)

Best Regards,

-- Kamil Ismayil
apache-kafka
google-cloud-platform
kubernetes

3 Answers

8/27/2021

Kafka itself needs a solution for persistence, you will probably need a cloud native storage solution. Create a storage class defining your storage requirements like replication factor, snapshot policy, and performance profile. Deploy Kafka as a StatefulSet on Kubernetes at a high level.

-- Deeptesh Bhattacharya
Source: StackOverflow

8/29/2021

Not understood exactly your purpose but in this case, you cannot guarantee Kafka's data resistance when pod fails/evicted. Maybe you should tried with native VM with Kafka installed and config that to be fully backed up (restorable anytime when disaster happen)

-- tulh
Source: StackOverflow

9/6/2021

It depends on what exactly you need. It's quite a general question.

You have some ready Kafka deployments if you would use MarketPlace.

As you are asking for pods, I guess you want to use Google Kubernetes Engine. On the internet you can find many guides about using Kafka on Kubernetes.

For example you can refer Kafka with Zookeeper on Portworx. In one of the steps you have StorageClass yaml. In GKE default storageclass is set to delete but you can create a new storageclass with reclaimPolicy: Retain which will keep the disk in GCP after delete of pod.

In GCP you have also an option to create disk snapshot

In addition to some best practice using Kafka on Kubernetes you can find here.

-- PjoterS
Source: StackOverflow