I am planning to run Kafka on GCP (google cloud platform).
What I wonder is what happens to a data in Kafka topic when a GCP pod fails? By default a new pod will be created, but will the data in Kafka topic be lost? How can I avoid data loss in this situation?
I appreciate any help. Thanks in advance :)
Best Regards,
Kafka itself needs a solution for persistence, you will probably need a cloud native storage solution. Create a storage class defining your storage requirements like replication factor, snapshot policy, and performance profile. Deploy Kafka as a StatefulSet on Kubernetes at a high level.
Not understood exactly your purpose but in this case, you cannot guarantee Kafka's data resistance when pod fails/evicted. Maybe you should tried with native VM with Kafka installed and config that to be fully backed up (restorable anytime when disaster happen)
It depends on what exactly you need. It's quite a general question.
You have some ready Kafka deployments if you would use MarketPlace.
As you are asking for pods, I guess you want to use Google Kubernetes Engine. On the internet you can find many guides about using Kafka on Kubernetes.
For example you can refer Kafka with Zookeeper on Portworx. In one of the steps you have StorageClass yaml. In GKE default storageclass is set to delete
but you can create a new storageclass
with reclaimPolicy: Retain
which will keep the disk in GCP after delete of pod.
In GCP you have also an option to create disk snapshot
In addition to some best practice using Kafka on Kubernetes you can find here.