I have a K8s cluster which runs independent jobs (each job has one pod) and I expect them to run to completion. The scheduler, however, sometimes reschedules them on a different node. My jobs need to be single-run, and restarting them on a different node is not an acceptable outcome for me.
I was looking at Pod disruption budgets (PDB), but from what I understand their selectors apply to a label of pods. Since every one of my job is different and has a separate label, how do I use PDB to tell K8s that all of my pods have a maxUnavailable of 0?
I have also used this annotation
"cluster-autoscaler.kubernetes.io/safe-to-evict": false
but this does not affect pod evictions on resource pressures.
Ideally, I should be able to tell K8s that none of my Pods should be evicted unless they are complete.
You should specify resources in order for your jobs to become Guaranteed quality of service:
resources:
limits:
memory: "200Mi"
cpu: "700m"
requests:
memory: "200Mi"
cpu: "700m"
Requests should be equal to limits - then your pod will become Guaranteed and will not be anymore evicted.
Read more: https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod