kubernetes with multiple jobs counter

10/1/2019

New to kubernetes i´m trying to move a current pipeline we have using a queing system without k8s.

I have a perl script that generates a list of batch jobs (yml files) for each of the samples that i have to process. Then i run kubectl apply --recursive -f 16S_jobscripts/

For example each sample needs to be treated sequentially and go through different processing

Exemple:

SampleA -> clean -> quality -> some_calculation

SampleB -> clean -> quality -> some_calculation

and so on for 300 samples.

So the idea is to prepare all the yml files and run them sequentially. This is working.

BUT, with this approach i need to wait that all samples are processed (let´s say that all the clean jobs need to completed before i run the next jobs quality).

what would be the best approach in such case, run each sample independently ?? how ?

The yml below describe one Sample for one job. You can see that i´m using a counter (mergereads-1 for sample1(A))

apiVersion: batch/v1
kind: Job
metadata:
  name: merge-reads-1
  namespace: namespace-id-16s
  labels:
    jobgroup: mergereads
spec:
  template:
    metadata:
      name: mergereads-1
      labels:
        jobgroup: mergereads
    spec:
      containers:
        - name: mergereads-$idx
          image: .../bbmap:latest
          command: ['sh', '-c']
          args: ['
          cd workdir &&

          bbmerge.sh -Xmx1200m in1=files/trimmed/1.R1.trimmed.fq.gz in2=files/trimmed/1.R2.trimmed.fq.gz  out=files/mergedpairs/1.merged.fq.gz  merge=t mininsert=300 qtrim2=t minq=27 ratiomode=t &&
          ls files/mergedpairs/ 
          ']

          resources:
            limits:
              cpu: 1
              memory: 2000Mi
            requests:
              cpu: 0.8
              memory: 1500Mi
          volumeMounts:
            - mountPath: '/workdir'
              name: db
      volumes:
        - name: db
          persistentVolumeClaim:
            claimName: workdir
      restartPolicy: Never
-- david
kubernetes
kubernetes-pod

1 Answer

10/2/2019

If i understand you correctly you can use parallel-jobs with a use of Job Patterns.

It does support parallel processing of a set of independent but related work items.

Also you can consider using Argo. https://github.com/argoproj/argo

Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition).

Please let me know if that helps.

-- OhHiMark
Source: StackOverflow