What is the right method for storing files in a microservice architecture?

10/25/2018

I'm currently working on a traditional monolith application, but I am in the process of breaking it up into spring microservices managed by kubernetes. The application allows the uploading/downloading of large files and these files are normally stored on the host filesystem. I'm wondering what would be the most viable method of persisting these files in a microservice architecture?

-- Matt Greene
docker
kubernetes
microservices
persistence
spring

4 Answers

10/25/2018

You have a bunch of different options, Googling your question you'll find many answers, for any budget and taste. Basically you'd want high-availability storage like AWS S3. You could setup your own dedicated server to store these files as well if you wanted to cut costs, but then you'd have to worry about backups and availability. If you need low latency access to these files then you'd want to have them behind CDN as well.

-- Denis Pshenov
Source: StackOverflow

10/26/2018

Maybe you should have a look at the rook project (https://rook.io/). It's easy to set up and provides different kinds of storage and persistence technologies to your CNAs.

-- Louis Baumann
Source: StackOverflow

10/26/2018

We are mostly on prem. We end up using nfs. Path to least resistance, but probably not the most performant and making it highly available is tough. If you have the chance i agree with Denis Pshenov, that S3-like system for example minio might be a better alternative.

-- Bal Chua
Source: StackOverflow

10/26/2018

There are many places to store your data. It also depends on the budget that you are able to spent (Holding duplicate data means also more storage which costs money) and mostly on your business requirements.

  • Is all data needed at all time?
  • Are there geo/region-related cases?
  • How fast needs a read / write operation need to be?
  • Do things need to be cached?
  • Statefull or Stateless?
  • Are there operational requirements? How should this be maintained?
  • ...

A part from this your microservices should not know where the data is actually stored. In kubernetes you can use Persistent-Volumes https://kubernetes.io/docs/concepts/storage/persistent-volumes/ that can link to a storage of your Cloud-Provider or something else. The microservice should just mount the volume and be able to treat it like a local file.

Note that the Cloud Provider Storages already include solutions for scaling, concurrency etc. So I would probably use a single Blob-Storage under the hood.

However it has to be said, there is trend to understand a microservice as a package of data and logic coupled together and also accept duplicating the data, which leads to better scalability.

See for more information:

-- Peter Ittner
Source: StackOverflow