Stateful jobs in Kubernetes

4/17/2019

I have a requirement to run an ad-hoc job, once in a while. The job needs some state to work. Building the state takes a lot of time. So, it is desired to keep the state persistent and reusable in subsequent runs, for a fast turnaround time. I want this job to be managed as K8s pods.

This is a complete set of requirements:

  1. Pods will go down after work finish. The K8s controller should not try to bring up the pods.
  2. Each pod should have a persistent volume attached to it. There should be 1 volume per pod. I am planning to use EBS.
  3. We should be able to manually bring the pods back up in future.
  4. Future runs may have more or less replicas than the past runs.

I know K8s supports both Jobs and Statefulsets. Is there any Controller which supports both at the same time?

-- Ashish Tyagi
jobs
kubernetes
stateful

1 Answer

4/17/2019
  1. Pods will go down after work finish. The K8s controller should not try to bring up the pods.

This is what Jobs do - run to completion. You only control whether you wanna retry on exit > 0.

  1. Pods should have a persistent volume attached to them.

Same volume to all? Will they write or only read? What volume backend do you have, AWS EBS or similar? Depending of answers you might want to split input data between few volumes or use separate volumes to write and then finalization job to assemble in 1 volume (kind of map reduce). Or use volume backend which supports multi-mount RW https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes (see table for ReadWriteMany)

  1. We should be able to manually bring the pods back up in future.

Jobs fit here: You launch it when you need it, and it runs till completion.

  1. Future runs may have more or less replicas than the past runs.

Jobs fit here. Specify different completions or parallelism when you launch a job: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#parallel-jobs

StatefulSets are different concept, they mostly used for clustered software which you run continuously and need to persist the role per pod (e.g. shard).

-- Max Lobur
Source: StackOverflow