I'm currently investigating using dynamically provisioned persistent disks in the GCE application: In my application I have 1-n pods, where each pod contains a single container that needs rw access to a persistent volume. The volume needs to be pre-populated with some data which is copied from a bucket.
What I'm confused about is; if the persistent disk is dynamically allocated, how do I ensure that data is copied onto it before it is mounted to my pod? The copying of the data is infrequent but regular, the only time I might need to do this out of sequence is if a pod falls over and I need a new persistent disk and pod to take it's place.
How do I ensure the persistent disk is pre populated before it is mounted to my pod?
My current thought is to have the bucket mounted to the pod, and as part of the startup of the pod, copy from the bucket to the persistent disk. This creates another problem, in that the bucket cannot be write enabled and mounted to multiple pods.
Note: I'm using a seperate persistent disk as I need it to be an ssd for speed.
Looks like the copy is a good candidate to be done as an "init container".
That way on every pod start, the "init container" would connect to the GCS bucket and check the status of the data, and if required, copy the data to the dynamically assigned PersistentDisk.
When completed, the main container of the pod starts, with data ready for it to use. By using an "init container" you are guaranteeing that:
The copy is complete before your main pod container starts.
The main container does not need access to the GCS, just the dynamically created PV.
If the "init container" fails to complete successfully then your pod would fail to start and be in an error state.
Used in conjunction with a StatefulSet
of N pods this approach works well, in terms of being able to initialize a new replica with a new disk, and keep persistent data across main container image (code) updates.