Proper way for pods to read input files from the same persistent volume?

10/5/2021

I'm new to Kubernetes and plan to use Google Kubernetes Engine. Hypothetically speaking, let's say I have a K8s cluster with 2 worker nodes. Each node would have its own pod housing the same application. This application will grab a file from some persistent volume and generate an output file that will be pushed back into a different persistent volume. Both pods in my cluster would be doing this continuously until there are no input files in the persistent volume left to be processed. Do the pods inherently know NOT to grab the same file that one pod is already using? If not, how would I be able account for this? I would like to avoid 2 pods using the same input file.

-- Adriano Matos
google-kubernetes-engine
kubernetes

1 Answer

10/5/2021

Do the pods inherently know NOT to grab the same file that one pod is already using?

Pods are just processes. Two separate processes accessing files from a shared directory are going to run into conflicts unless they have some sort of coordination mechanism.

Option 1

Have one process whose job it is to enumerate the available files. Your two workers connect to this process and receive filenames via some sort of queue/message bus/etc. When they finish processing a file, they request the next one, until all files are processed. Because only a single process is enumerating the files and passing out the work, there's no option for conflict.

Option 2

In general, renaming files is an atomic operation. Each worker creates a subdirectory within your PV. To claim a file, it renames the file into the appropriate subdirectory and then processes it. Because renames are atomic, even if both workers happen to pick the same file at the same time, only one will succeed.

Option 3

If your files have some sort of systematic naming convention, you can divide the list of files between your two workers (e.g., "everything that ends in an even number is processed by worker 1, and everything that ends with an odd number is processed by worker 2").


Etc. There are many ways to coordinate this sort of activity. The wikipedia entry on Synchronization may be of interest.

-- larsks
Source: StackOverflow