I was running this to see how job restart works in k8s.
kubectl run alpine --image=alpine --restart=OnFailure -- exit 1
The alpine image was already there. The first failure happened almost within a second. k8s takes 5 minutes to do 5 restarts! why does it not try immediately? Is there any way reduce the time between 2 restarts?
Take a look at the Pod Lifecycle docs:
Exited Containers that are restarted by the kubelet are restarted with an exponential back-off delay (10s, 20s, 40s …) capped at five minutes, and is reset after ten minutes of successful execution.
I think that there is no way to configure the back-off delay time.
EDIT: There is an open issue requesting this feature.
Also, note that using kubectl run
you are not simulating "job restarts". Jobs are managed by Job Controllers, which behaves a little bit different when handling pod/containers errors, as it takes into account the combination of restartPolicy
, parallelism
, completions
and the backoffLimit
configs:
There are situations where you want to fail a Job after some amount of retries due to a logical error in configuration etc. To do so, set .spec.backoffLimit to specify the number of retries before considering a Job as failed. The back-off limit is set by default to 6. Failed Pods associated with the Job are recreated by the Job controller with an exponential back-off delay (10s, 20s, 40s …) capped at six minutes.