Is it possible to configure backoffLimit
globally (for example, change default limit from 6 to 2 for all jobs in cluster not specifying backoffLimit: 2
for each job)?
It seems that the default values, with the spec.backOffLimit
being included, are hardcoded directly into Kubernetes code.
From apis/batch/v1/defaults.go
func SetDefaults_Job(obj *batchv1.Job) {
// For a non-parallel job, you can leave both `.spec.completions` and
// `.spec.parallelism` unset. When both are unset, both are defaulted to 1.
if obj.Spec.Completions == nil && obj.Spec.Parallelism == nil {
obj.Spec.Completions = utilpointer.Int32Ptr(1)
obj.Spec.Parallelism = utilpointer.Int32Ptr(1)
}
if obj.Spec.Parallelism == nil {
obj.Spec.Parallelism = utilpointer.Int32Ptr(1)
}
if obj.Spec.BackoffLimit == nil {
obj.Spec.BackoffLimit = utilpointer.Int32Ptr(6)
}
labels := obj.Spec.Template.Labels
if labels != nil && len(obj.Labels) == 0 {
obj.Labels = labels
}
if utilfeature.DefaultFeatureGate.Enabled(features.IndexedJob) && obj.Spec.CompletionMode == nil {
mode := batchv1.NonIndexedCompletion
obj.Spec.CompletionMode = &mode
}
if utilfeature.DefaultFeatureGate.Enabled(features.SuspendJob) && obj.Spec.Suspend == nil {
obj.Spec.Suspend = utilpointer.BoolPtr(false)
}
}
So I think it cannot be changed without changing the code, at the moment.
No, it's not possible since backoffLimit
is configured on Pod level as per the official documentation:
There are situations where you want to fail a Job after some amount of retries due to a logical error in configuration etc. To do so, set .spec.backoffLimit to specify the number of retries before considering a Job as failed. The back-off limit is set by default to 6. Failed Pods associated with the Job are recreated by the Job controller with an exponential back-off delay (10s, 20s, 40s ...) capped at six minutes. The back-off count is reset when a Job's Pod is deleted or successful without any other Pods for the Job failing around that time.