I have a golang server which is used to deploy AI model trainings into kubernetes that every training would run their job on a pod. After job is completed, my server need to upload the model output to HDFS/S3.
So I need a post-container to process the upload task like init-container doing init task on k8s pod.
For now, i use a tricky way that adding the model job container into init-containers and running the upload task container in containers. This works if there's no error thrown in init-containers. However, if there's errors in init-conttainer the pod status is Init:ContainerCannotRun
which should be Failed
in normal.
I know i can attach a preStop
command to container lifecycle events if the image container the upload-hdfs/s3 command tool. However I do not want to let the model training images to include these commands. So this is not my answer.
So my question is that how to implement a post process container so that i can run the upload task after job is completed?
I also find a related issue in github, i would try it if there's no other choice.
There are two ways in Kubernetes to cope with it - preStop hook and the code which will listen for the SIGTERM signal.
Also, you can increase the termination grace period to have more time before pod will be terminated.
More information you can find in Attach Handlers to Container Lifecycle Events and Kubernetes best practices: terminating with grace.