I have hundreds of binary files varying in size from 5mb to 500mb and a python script which takes one file as input and outputs small .txt file in 10 minutes (250mb file).
In order to process it ASAP I have 10 (local) servers with 20 cores each. What would be the best way to split this job if I wanted to add more hardware later? I'm certain this has been done million times before and that there should be some open source solution?
I was thinking kubernetes, because it has docker containers which can easily isolate dependencies of script.py, and putting all the binary files on a single network shared drive mounted on all servers at /mnt/shrd_drive from which they can read.