I was able to get a kubernetes job up and running on AKS (uses docker hub image to process a biological sample and then upload the output to blob storage - this is done with a bash command that I provide in the args section of my yaml file). However, I have 20 samples, and would like to spin up 20 nodes so that I can process the samples in parallel (one sample per node). How do I send each sample to a different node? The "parallelism" option in a yaml file processes all of the 20 samples on each of the 20 nodes, which is not what I want.
Thank you for the help.
How/where the samples are stored? You could load them (or a pointer to the actual sample) into a queue like Kafka and let the application retrieve each sample once and upload it to the blob after computation. You can then even assure that if a computation fails, another pod will pick it up and restart the computation.
if you want each instance of the job to be on a different node, you can use daemonSet, thats exactly what it does, provisions 1 pod per worker node.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: kube-system
labels:
k8s-app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd-elasticsearch
image: k8s.gcr.io/fluentd-elasticsearch:1.20
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/
Another way of doing that - using pod antiaffinity:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- zk
topologyKey: "kubernetes.io/hostname"
The requiredDuringSchedulingIgnoredDuringExecution field tells the Kubernetes Scheduler that it should never co-locate two Pods which have app label as zk in the domain defined by the topologyKey. The topologyKey kubernetes.io/hostname indicates that the domain is an individual node. Using different rules, labels, and selectors, you can extend this technique to spread your ensemble across physical, network, and power failure domains