How to separate application and data syncing implementations in Kubernetes?

2/26/2019

I want to build an application/service which uses a static application data which can get updated over time. Currently, I implemented this by having both the application and the data within the same container. However, this requires redeployment when either application or data change. I want to separate the app and data volume implementation so that I can update the application and the data independently, meaning I won't have to rebuild the application layer when the application data is updated and vice versa.

Here are the characteristics of the Application Data and its Usage:

  • Data is not frequently updated, but read very frequently by the application
  • Data is not a database, it's a collection of file objects with size ranging from 100MB to 4GB which is initially stored in a cloud storage
  • Data stored in the cloud storage serves as a single source of truth for the application data
  • The application will only read from the Data. The process of updating data in cloud storage is an external process outside the application scope.

So here, we are interested in sync-ing the data in cloud storage to the volume in Kubernetes deployment. What's the best way to achieve this objective in Kubernetes?

I have several options in mind:

  1. Using one app container in one deployment, in which the app will also include the logic for data loading and update which pulls data from cloud storage to the container --> simple but tightly coupled with the storage read-write implementation

  2. Using the cloud store directly from the app --> this doesn't require container volume, but I was concerned with the huge file size because the app is an interactive service which requires a quick response

  3. Using two containers in one deployment sharing the same volume --> allow great flexibility for the storage read-write implementation

    • one container for application service reading from the shared volume
    • one container for updating data and listening to update data request which writes data to the shared volume --> this process will pull data from cloud storage to the shared volume
  4. Using one container with a Persistent Disk

    • an external process which writes to the persistent disk (not sure how to do this yet with cloud storage/file objects, need to find a way to sync gcs to persistent disk)
    • one container for application service which reads from the mounted volume
  5. Using Fuse mounts

    • an external process which writes to cloud storage
    • a container which uses fuse mounts

I am currently leaning towards option 3, but I am not sure if it's the common practice of achieving my objective. Please let me know if you have better solutions.

-- dekauliya
docker
google-kubernetes-engine
kubernetes

1 Answer

2/26/2019

Yes. 3. is the most common option but make sure you use an initContainer to copy the data from your cloud storage to a local volume. That local volume could be any of the types supported by Kubernetes.

-- Rico
Source: StackOverflow