mount a different volume for each job in kubernetes

6/16/2018

We have a process which takes one input - directory location (NFS share path). Process reads, does some processing and writes to it.

Due to nature of the contents, directory and process permissions are set such a way that process can only access that directory and nothing else. This process is short lived (~1 minute) and there may be hundreds of thousands of invocations each day - each time on a different directory.

Trying to move this workload to docker/kubernetes environment. One way I think of is

  1. Create a PersistentVolume for the directory
  2. Create a PersistentVolumeClaim and bind it
  3. Mount the above PVC to the pod specification for the job
  4. Once the job is complete, delete the PV, PVC and job

Just looking at the steps, I think it might be overkill or lot of overhead (lots of objects to be created in k8s, underlying volume mounted/unmounted on the host for each job).

Any other ideas?

-- Raj
docker
kubernetes

1 Answer

6/17/2018

Any other ideas?

If I get your setup right there are several approaches each with it's own pro and cons. I'll try to list some of the ideas:

  • As you noted, you can create each time all of the resources (PV,PVC...). Now, I wouldn't be worried about 'lot of objects' in k8s but such approach can introduce significant overhead and execution time penalty. If your process is indeed 1-2 sec each then bounding, starting and tear down can introduce said overhead. Pro is better isolation and concurrency and con is introduced overhead.

  • Another approach might be to make directory structure like so:

    /root_of_raw_data
        |
        +-- /process_folder_1
        |
        +-- /process_folder_2
        |
       ...
        |
        +-- /process_folder_n

    and then make PV and PVC that point only to /root_of_raw_data, supposing that your provisioner allows for ReadWriteMany (and NFS provisioner should allow that). Then you wouldn't need time to setup/teardown PV/PVC (they would be constantly bound) and on each pod start you would mount it using subPath to /process_folder_x (where x is corespondent to that very process) to say /my_process_work_folder inside that pod and then start process with /my_process_work_folder. Pro is that you don't have to introduce overhead for PV/PVC bounding and con is that you still have overhead of pod starting/teardown.

  • Yet another approach could be to have same directory structure as above, but instead of using subPath to mount process folders to pods individually, you actually mount /root_of_raw_data folder to, say, /my_root_work_folder inside a pod. Then you would start process with /my_root_work_folder/process_folder_x (again x being tied to process in question). This way you could leave pod running all the time (or multiple pods if needed, again providing ReadWriteMany can be used for PV) and instead of starting/teardown pods simply calling kubectl -n my-process-namespace exec -it my-process-pod-name my_process_start_command /my_root_work_folder/process_folder_y. Pro is that you don't have any overhead for start/stop at all and con is that you have pod(s) constantly running and they all share same process root folder.

You can also make variations on mentioned approaches using jobs if you need pods logs, or, alternatively, you can make schedulers around pod usage and such... This answer was aimed mainly at giving you some other angles about eliminating potential overhead of setup/teardown and is by no means exhaustive list of approaches.

-- Const
Source: StackOverflow