I have a data pipeline in Go with steps A, B and C. Currently those are three binaries. They share the same database but write to different tables. When developing locally, I have been just running ./a && ./b && ./c
. I'm looking to deploy this pipeline to our Kubernetes cluster.
I want A -> B -> C to run once a day, but sometimes (for debugging etc.) I may just want to manually run A or B or C in isolation.
Is there a simple way of achieving this in Kubernetes?
I haven't found many resources on this, so maybe that demonstrates an issue with my application's design?
Create a docker image that holds all three binaries and a wrapper script to run all three.
Then deploy a Kubernetes CronJob
that runs all three sequentially (using the wrapper script as entrypoint/command), with the appropriate schedule.
For debugging you can then just run the the same image manually:
kubectl -n XXX run debug -it --rm --image=<image> -- /bin/sh
$ ./b
...