I’ve created a Cronjob
in kubernetes with schedule(8 * * * *
), with job’s backoffLimit
defaulting to 6 and pod’s RestartPolicy
to Never
, the pods are deliberately configured to FAIL. As I understand, (for podSpec with restartPolicy : Never
) Job controller will try to create backoffLimit
number of pods and then it marks the job as Failed
, so, I expected that there would be 6 pods in Error
state.
This is the actual Job’s status:
status:
conditions:
- lastProbeTime: 2019-02-20T05:11:58Z
lastTransitionTime: 2019-02-20T05:11:58Z
message: Job has reached the specified backoff limit
reason: BackoffLimitExceeded
status: "True"
type: Failed
failed: 5
Why were there only 5 failed pods instead of 6? Or is my understanding about backoffLimit
in-correct?
Use spec.backoffLimit
to specify the number of retries before considering a Job as failed. The back-off limit is set to 6 by default.
In short: You might not be seeing all created pods because period of schedule in the cronjob is to short.
As described in documentation:
Failed Pods associated with the Job are recreated by the Job controller with an exponential back-off delay (10s, 20s, 40s …) capped at six minutes. The back-off count is reset if no new failed Pods appear before the Job’s next status check.
If new job is scheduled before Job controller has a chance to recreate a pod (having in mind the delay after previous failure), Job controller starts counting from one again.
I reproduced your issue in GKE using following .yaml
:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hellocron
spec:
schedule: "*/3 * * * *" #Runs every 3 minutes
jobTemplate:
spec:
template:
spec:
containers:
- name: hellocron
image: busybox
args:
- /bin/cat
- /etc/os
restartPolicy: Never
backoffLimit: 6
suspend: false
This job will fail because file /etc/os
doesn't exist.
And here is an output of kubectl describe
for one of the jobs:
Name: hellocron-1551194280
Namespace: default
Selector: controller-uid=b81cdfb8-39d9-11e9-9eb7-42010a9c00d0
Labels: controller-uid=b81cdfb8-39d9-11e9-9eb7-42010a9c00d0
job-name=hellocron-1551194280
Annotations: <none>
Controlled By: CronJob/hellocron
Parallelism: 1
Completions: 1
Start Time: Tue, 26 Feb 2019 16:18:07 +0100
Pods Statuses: 0 Running / 0 Succeeded / 6 Failed
Pod Template:
Labels: controller-uid=b81cdfb8-39d9-11e9-9eb7-42010a9c00d0
job-name=hellocron-1551194280
Containers:
hellocron:
Image: busybox
Port: <none>
Host Port: <none>
Args:
/bin/cat
/etc/os
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 26m job-controller Created pod: hellocron-1551194280-4lf6h
Normal SuccessfulCreate 26m job-controller Created pod: hellocron-1551194280-85khk
Normal SuccessfulCreate 26m job-controller Created pod: hellocron-1551194280-wrktb
Normal SuccessfulCreate 26m job-controller Created pod: hellocron-1551194280-6942s
Normal SuccessfulCreate 25m job-controller Created pod: hellocron-1551194280-662zv
Normal SuccessfulCreate 22m job-controller Created pod: hellocron-1551194280-6c6rh
Warning BackoffLimitExceeded 17m job-controller Job has reached the specified backoff limit
Note the delay between creation of pods hellocron-1551194280-662zv
and hellocron-1551194280-6c6rh
.