I'm wondering for a batch distributed job I need to run. Is there a way in K8S if I use a Job/Stateful Set or whatever, a way for the pod itself(via ENV var or whatever) to know its 1 of X pods run for this job?
I need to chunk up some data and have each process fetch the stuff it needs.
--
I guess the statefulset hostname setting is one way of doing it. Is there a better option?
You could write some infrastructure as code using Ansible that will perform the following tasks in order:
kubectl create -f jobs.yml
kubectl wait --for=condition=complete job/job1
kubectl wait --for=condition=complete job/job2
kubectl wait --for=condition=complete job/job3
kubectl create -f pod.yml
kubectl wait can be used in situations like this to halt progress until an action has been performed. In this case, a job has completed its run.
Here is a similar question that someone asked on StackOverflow before.
This is planned but not yet implemented that I know of. You probably want to look into higher order layers like Argo Workflows or Airflow instead for now.