KubeFlow, handling large dynamic arrays and ParallelFor with current size limitations

1/22/2020

I've been struggling to find a good solution for this manner for the past day and would like to hear your thoughts.

I have a pipeline which receives a large & dynamic JSON array (containing only stringified objects), I need to be able to create a ContainerOp for each entry in that array (using dsl.ParallelFor).

This works fine for small inputs.

Right now the array comes in as a file http url due to pipeline input arguements size limitations of argo and Kubernetes (or that is what I understood from the current open issues), but - when I try to read the file from one Op to use as input for the ParallelFor I encounter the output size limitation.

What would be a good & reusable solution for such a scenario?

Thanks!

-- Yoni Cohen
argo-workflows
argoproj
kubeflow
kubernetes

0 Answers