I have replicated cassandra database and would like to know the best way to maintain its data.
Currently im using kubernetes emptyDir
for cassandra container volume.
I will answer your questions in the same order:
1: You can use Google's persistent disks for the master Cassandra node and then all the other cassandra replicas will just use their local emptyDir.
2: When deploying to the cloud, the expectation is that instances are ephemeral and might die at any time. Cassandra is built to replicate data across the cluster to facilitate data redundancy, so that in the case that an instance dies, the data stored on the instance does not, and the cluster can react by re-replicating the data to other running nodes. You can use DaemonSet to place a single pod on each node in the Kubernetes cluster which will give u data redundancy.
Is it possible to provide more information here? how the new pods will spin up?
Taking a snapshot of the disk, or use epmtyDir with a sidecar container in order to periodically snapshot the directory and upload it to Google Cloud Storage.