Kubernetes jobs and back-off limit values: is the value on number of retries or minutes?

8/8/2019

I was reading the Kubernetes documentation about jobs and retries. I found this:

There are situations where you want to fail a Job after some amount of retries due to a logical error in configuration etc. To do so, set .spec.backoffLimit to specify the number of retries before considering a Job as failed. The back-off limit is set by default to 6. Failed Pods associated with the Job are recreated by the Job controller with an exponential back-off delay (10s, 20s, 40s …) capped at six minutes. The back-off count is reset if no new failed Pods appear before the Job’s next status check.

I had two questions about the above quote:

  1. The back-off limit value is on minutes or number of retries? The documentation example using the value 6 (six) is confuse, because he initially affirms that the value is the number of retries but after that said "capped at six minutes".
  2. There is a way to define the back-off delay time? As I understand, this behavior (10s, 20s, 40s …) is default and can't be changed.
-- Dherik
jobs
kubernetes
kubernetes-cronjob

1 Answer

8/8/2019

No confusion about the .spec.backoffLimit is is the number of retries.

The Job controller recreates the failed Pods (associated with the Job) in an exponential delay (10s, 20s, 40s, ... , 360s). And of course, this delay time is set by the Job controller.

  • If the Pod fails, after 10s new Pod will be created
  • If fails again, after 20s new one will be created
  • If fails again, after 40s new one comes
  • If fails again, next one comes after 80s (1m 20s)
  • If fails again, next one comes after 160s (2m 40s)
  • If fails again, after 320s (5m 20s), new Pod comes
  • If fails again, after 360s (not 640s, cause it is greater than 360s or 6m) you will see the next one
-- Shudipta Sharma
Source: StackOverflow