Dynamic "Fan-In" for artifact outputs in Argo?

1/9/2022

I have an Argo workflow with dynamic fan-out tasks that do some map operation (in a Map-Reduce meaning context). I want to create a reducer that aggregates their results. It's possible to do that when the outputs of each mapper are small and can be put as an output parameter. See this SO question-answer for the description of how to do it.

But how to aggregate output artifacts with Argo without writing custom logic of writing them to some storage in each mapper and read from it in reducer?

-- Alexander Reshytko
argo-workflows
directed-acyclic-graphs
kubernetes

1 Answer

1/9/2022

Artifacts are more difficult to aggregate than parameters.

Parameters are always text and are generally small. This makes it easy for Argo Workflows to aggregate them into a single JSON object which can then be consumed by a "reduce" step.

Artifacts, on the other hand, may be any type or size. So Argo Workflows is limited in how much it can help with aggregation.

The main relevant feature it provides is declarative repository write/read operations. You can specify, for example, an S3 prefix to write each parameter to. Then, in the reduce step, you can load everything from that prefix and perform your aggregation logic.

Argo Workflows provides a generic map/reduce example. But besides artifact writing/reading, you pretty much have to do the aggregation logic yourself.

-- crenshaw-dev
Source: StackOverflow