How do I run a multi-step cron job, but still make it able to execute a single step manually?

9/27/2018

I have a data pipeline in Go with steps A, B and C. Currently those are three binaries. They share the same database but write to different tables. When developing locally, I have been just running ./a && ./b && ./c. I'm looking to deploy this pipeline to our Kubernetes cluster.

I want A -> B -> C to run once a day, but sometimes (for debugging etc.) I may just want to manually run A or B or C in isolation.

Is there a simple way of achieving this in Kubernetes?

I haven't found many resources on this, so maybe that demonstrates an issue with my application's design?

-- Jamie Patel
etl
go
kubernetes
pipeline

1 Answer

9/27/2018

Create a docker image that holds all three binaries and a wrapper script to run all three.

Then deploy a Kubernetes CronJob that runs all three sequentially (using the wrapper script as entrypoint/command), with the appropriate schedule.

For debugging you can then just run the the same image manually:

kubectl -n XXX run debug -it --rm --image=<image> -- /bin/sh
$ ./b
...
-- Jukka
Source: StackOverflow