New to kubernetes i´m trying to move a current pipeline we have using a queing system without k8s.
I have a perl script that generates a list of batch jobs (yml files) for each of the samples that i have to process. Then i run kubectl apply --recursive -f 16S_jobscripts/
For example each sample needs to be treated sequentially and go through different processing
Exemple:
SampleA -> clean -> quality -> some_calculation
SampleB -> clean -> quality -> some_calculation
and so on for 300 samples.
So the idea is to prepare all the yml files and run them sequentially. This is working.
BUT, with this approach i need to wait that all samples are processed (let´s say that all the clean jobs need to completed before i run the next jobs quality).
what would be the best approach in such case, run each sample independently ?? how ?
The yml below describe one Sample for one job. You can see that i´m using a counter (mergereads-1 for sample1(A))
apiVersion: batch/v1
kind: Job
metadata:
name: merge-reads-1
namespace: namespace-id-16s
labels:
jobgroup: mergereads
spec:
template:
metadata:
name: mergereads-1
labels:
jobgroup: mergereads
spec:
containers:
- name: mergereads-$idx
image: .../bbmap:latest
command: ['sh', '-c']
args: ['
cd workdir &&
bbmerge.sh -Xmx1200m in1=files/trimmed/1.R1.trimmed.fq.gz in2=files/trimmed/1.R2.trimmed.fq.gz out=files/mergedpairs/1.merged.fq.gz merge=t mininsert=300 qtrim2=t minq=27 ratiomode=t &&
ls files/mergedpairs/
']
resources:
limits:
cpu: 1
memory: 2000Mi
requests:
cpu: 0.8
memory: 1500Mi
volumeMounts:
- mountPath: '/workdir'
name: db
volumes:
- name: db
persistentVolumeClaim:
claimName: workdir
restartPolicy: Never
If i understand you correctly you can use parallel-jobs with a use of Job Patterns.
It does support parallel processing of a set of independent but related work items.
Also you can consider using Argo. https://github.com/argoproj/argo
Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition).
Please let me know if that helps.