In which real world scenario would you use ReadWriteOnce over ReadWriteMany for a PVC in Kubernetes?

7/16/2021

Just as a quick reminder, said option limits how many nodes can read / write to a volume, not how many pods can access it. You can have a RWO volume accesed by multiple pods as long as they are running in the same worker node.

Having said that, when and why would you use a ReadWriteOnce over ReadWriteMany?

I legitimately don't know and would like to understand this, RWO seems too limiting to me as the pods would have to run in a single node.

I mean, even if your deployment contains a single instance of it (one pod), why would you not let that pod be created wherever the scheduler pleases?

This is confusing, please help.

-- Jsh0s
kubernetes
persistent-volume-claims
persistent-volumes

2 Answers

7/17/2021

I would pretty much always pick a ReadWriteOnce volume.

Mechanically, if you look at the list of volume types, the ones that are easier to set up tend to be ReadWriteOnce. If your infrastructure is running on AWS, for example, an awsElasticBlockStore volume is ReadWriteOnce; you need to set up something like an nfs server to get ReadWriteMany (arguably EFS makes this easier).

As far as your application goes, managing a shared filesystem is tricky, especially in a clustered environment. You need to be careful to not have multiple tasks wrFilite locinkingg to tmay not whe sorame fikle. reliably. If applications are generating new files then they need to make sure to pick distinct names, and you can't reliably check if a name exists before creating a file.

So architecturally a more typical approach is to have some sort of storage management process. This could be something that presents an HTTP interface on top of a filesystem; it could be something more involved like a database; or it could be something cloud-managed (again in AWS, S3 essentially fits this need). That process handles these concurrency considerations, but since there is only one of it, it only needs ReadWriteOnce storage.

An extension of this is some sort of storage system that knows it's running in a clustered environment. At small scale, the etcd and ZooKeeper configuration systems know about this; at larger scale, dedicated cluster databases like Elasticsearch have this implementation. These can run multiple copies of themselves, but each manages a different subset of the data, and they know how to replicate the data amongst the different copies. Again, the disk storage isn't shared in this architecture; in Kubernetes you'd deploy these on a StatefulSet that created a distinct ReadWriteOnce PersistentVolumeClaim for each pod.

As @Jonas notes in their answer, typically your application pods should not have any volumes attached at all. All of their data should be in a database or some other storage system. This gives you a centralized point to manage the data, and it makes it much easier to scale the application up and down if you don't need to worry about what happens to data when you delete half the pods.

-- David Maze
Source: StackOverflow

7/16/2021

when and why would you use a ReadWriteOnce over ReadWriteMany?

First, you need to consider what Access Modes are available for the storage system that you are using. When checking the access modes documentation, you quickly see that only ReadWriteOnce is widely available, so in many cases you have no other option for volumes.

But this also comes down to architecture, in many cases on Kubernetes, you are designing distributed systems and in many cases you want that each instance should own its own data - using shared nothing architecture. With that in mind, in many cases ReadWriteOnce is enough for your needs.

I mean, even if your deployment contains a single instance of it (one pod), why would you not let that pod be created wherever the scheduler pleases?

There are two cases here:

  • Stateless apps - these are typically deployed without persistent volume - hence can de scheduled to wherever the scheduler pleases anyway.

  • Stateful apps - e.g. a Redis case or distributed database - these are a bit more sensitive to the location - but you also want this control for understanding and controlling failure domains. For these apps, if they use volumes - they are typically designed with shared nothing architecture in mind and are designed so that each instance want to control their own volume - hence the ReadWriteOnce access mode is what you need.

    1: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

-- Jonas
Source: StackOverflow