ReadWriteMany volumes on kubernetes with terabytes of data

1/13/2020

We want to deploy a k8s cluster which will run ~100 IO-heavy pods at the same time. They should all be able to access the same volume.

What we tried so far:

  • CephFS
    • was very complicated to set up. Hard to troubleshoot. In the end, it crashed a lot and the cause was not entirely clear.
  • Helm NFS Server Provisioner
    • runs pretty well, but when IO peaks a single replica is not enough. We could not get multiple replicas to work at all.
  • MinIO
    • is a great tool to create storage buckets in k8s. But our operations require fs mounting. That is theoretically possible with s3fs, but since we run ~100 pods, we would need to run 100 s3fs sidecars additionally. Thats seems like a bad idea.

There has to be some way to get 2TB of data mounted in a GKE cluster with relatively high availability?

Firestorage seems to work, but it's a magnitude more expensive than other solutions, and with a lot of IO operations it quickly becomes infeasible.


I contemplated creating this question on server fault, but the k8s community is a lot smaller than SO's.

-- yspreen
google-kubernetes-engine
kubernetes
kubernetes-helm
kubernetes-pvc

1 Answer

1/16/2020

I think I have a definitive answer as of Jan 2020, at least for our usecase:

| Solution        | Complexity | Performance | Cost           |
|-----------------|------------|-------------|----------------|
| NFS             | Low        | Low         | Low            |
| Cloud Filestore | Low        | Mediocre?   | Per Read/Write |
| CephFS          | High*      | High        | Low            |

* You need to add an additional step for GKE: Change the base image to ubuntu

I haven't benchmarked Filestore myself, but I'll just go with stringy05's response: others have trouble getting really good throughput from it

Ceph could be a lot easier if it was supported by Helm.

-- yspreen
Source: StackOverflow