How to specify two pods in the same job in a kubernetes yaml file?

7/1/2020

I'm trying to do something simple, just create two pods within a job. I'm looking at the documentation here: https://kubernetes.io/docs/concepts/workloads/controllers/job/#single-job-starts-controller-pod

While the documentation discusses parallelization it doesn't give much in the way of examples. The only example with one pod is given as:

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4

To create two pods I attempted effectively this:

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi1
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
    spec:
      containers:
      - name: pi2
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4

But that didn't get me two pods, instead it appears that only the second container ran in one pod.

It's not clear to me how I get my job to launch multiple pods. My pods don't need to run on the same machine, but they each need a unique environment variable to be set so the pods know what part of the work to do. The work is divided in an embarrassingly parallel way and with a fixed number of pods (2 in this example).

-- David Parks
kubernetes
kubernetes-pod

1 Answer

7/1/2020

You can not have two different pod template in the same job. From the docs here

Parallel Jobs with a work queue:

  • do not specify .spec.completions, default to .spec.parallelism.

  • the Pods must coordinate amongst themselves or an external service to determine what each should work on. For example, a Pod might fetch a batch of up to N items from the work queue.

  • each Pod is independently capable of determining whether or not all its peers are done, and thus that the entire Job is done. when any Pod from the Job terminates with success, no new Pods are created.

  • once at least one Pod has terminated with success and all Pods are terminated, then the Job is completed with success.

  • once any Pod has exited with success, no other Pod should still be doing any work for this task or writing any output. They should all be in the process of exiting.

For a work queue Job, you must leave 

.spec.completions

 unset, and set 

.spec.parallelism

 to a non-negative integer.

-- Arghya Sadhu
Source: StackOverflow