How to setup Kubernetes for a parallel job working on a collection of files

3/19/2018

So I'm basically looking for anyone that can point me in the right direction for setting up Kubernetes to perform a common computation on many work items where each work item is a separate file.

I have been reading the documentation here, and it seems to suggest that it is possible, the examples are shown with queues of words and simply printing the words, however, I am having trouble with persistent volumes.

What I need to end up with is a deployment that will take a large file containing data points and split it into several files. I then want to have a Job object execute several pods, one on each file, performing the computation before passing the files back to the deployment for post-processing.

I am having trouble finding out how to go about transferring the files, from what I read it seems that a PersistentVolume cannot be bound to more than one pod at once. So how do I go about passing a file to a single pod in a Job?

Any suggestions or general direction would be greatly appreciated.

-- arch
jobs
kubernetes
parallel-processing
persistent
volume

1 Answer

3/20/2018

PersistentVolume cannot be bound to more than one pod at once.

Whether a PV is shared among Nodes/Pods (or not) is determined by the accessMode; it's not the case that all PVs are universally bound to just one Node/Pod

As the chart on that page shows, there are many PV technologies that tolerate ReadWriteMany, with the most famous of them being NFS

-- mdaniel
Source: StackOverflow