How Databases synchronize data between persistent volumens in Kubernetes

7/14/2019

I`ve just read Deploying Cassandra with Stateful Sets topic in the Kubernetes documentation. The deployment process: 1. Creation of StorageClass 2. Creation of PersistentVolume (in my case 4 PersistentVolume). Set created in 1) storageClassName 3. Creation of Cassandra Headless Service 4. Using a StatefulSet to Create a Cassandra Ring - setting created in 1) storageClassName in StatefulSet yml definition.

As a result, there are 4 pods: Cassandra-0, Cassandra-1, Cassandra-2, Cassandra-4, which are mounted to created in 2) volumes (pv-0, pv-1, pv-2, pv-3). I wonder how / if these persistent volumes synchronize data with each other.

E.g. if I add some record, which will be written by pod cassandra-0 in persistent volume pv-0, then if someone who is going to retrieve data from the database a moment later - using the cassandra-1 pod/pv will see data that has been added to pv-0. Can anyone tell me how it works exactly?

-- michf
cassandra
kubernetes

4 Answers

7/15/2019

Thanks for comments guys! so, when I have my db with 3 PVs:

cassandra-pod0  cassandra-pod1 cassandra-pod2
     |                 |             |
cassandra-pv0   cassandra-pv0   cassandra-pv0 

Data is divided into 3 pvs.When I kill cassandra-pod1 - it is possible that I will lose (temporarily) part of the data. Am I right?

-- michf
Source: StackOverflow

7/14/2019
  1. This is not related to Kubernetes

  2. The replication is done by database and is configurable

  3. See the CAP theorem and Eventual Consistency for Cassandra

  4. You can control the level of consistency in Cassandra, whether the record is immediately updated across or later , depends on the configuration you do in Cassandra.

  5. See also: Synchronous Replication , Asynchronous Replication

Cassandra Consistency:

how to set cassandra read and write consistency

How is the consistency level configured?

-- Ijaz Ahmad Khan
Source: StackOverflow

7/14/2019

The mechanism to spread data across the clusters is independent if it was deployed in kubernetes or bare-metal instances. Cassandra will try to spread randomly the data across the nodes depending on a hash value (known as token), and will use the same algorithm to retrieve the information.

There are other factors to take in consideration: The replication factor (amount of copies), and the consistency level used.

You would want to take a look to DS201: DataStax Enterprise Foundations of Apache Cassandraâ„¢ in Datastax academy, where they cover the basics of Cassandra.

-- Carlos Monroy Nieblas
Source: StackOverflow

7/14/2019

Just to slightly extend Carlos' answer, Kubernetes is not involved and the volumes are completely isolated. The replication and distribution stuffs are entirely up to the database software to handle. As far as K8s sees, they are just separate processes and separate volumes.

-- coderanger
Source: StackOverflow