I'm trying to do something simple, just create two pods within a job. I'm looking at the documentation here: https://kubernetes.io/docs/concepts/workloads/controllers/job/#single-job-starts-controller-pod
While the documentation discusses parallelization it doesn't give much in the way of examples. The only example with one pod is given as:
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4
To create two pods I attempted effectively this:
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
spec:
containers:
- name: pi1
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
spec:
containers:
- name: pi2
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4
But that didn't get me two pods, instead it appears that only the second container ran in one pod.
It's not clear to me how I get my job to launch multiple pods. My pods don't need to run on the same machine, but they each need a unique environment variable to be set so the pods know what part of the work to do. The work is divided in an embarrassingly parallel way and with a fixed number of pods (2 in this example).
You can not have two different pod template in the same job. From the docs here
Parallel Jobs with a work queue:
do not specify .spec.completions
, default to .spec.parallelism
.
the Pods must coordinate amongst themselves or an external service to determine what each should work on. For example, a Pod might fetch a batch of up to N items from the work queue.
each Pod is independently capable of determining whether or not all its peers are done, and thus that the entire Job is done. when any Pod from the Job terminates with success, no new Pods are created.
once at least one Pod has terminated with success and all Pods are terminated, then the Job is completed with success.
once any Pod has exited with success, no other Pod should still be doing any work for this task or writing any output. They should all be in the process of exiting.
For a work queue Job, you must leave
.spec.completions
unset, and set
.spec.parallelism
to a non-negative integer.