k8s - Pod restart time

3/19/2019

I was running this to see how job restart works in k8s.

kubectl run alpine --image=alpine --restart=OnFailure -- exit 1

The alpine image was already there. The first failure happened almost within a second. k8s takes 5 minutes to do 5 restarts! why does it not try immediately? Is there any way reduce the time between 2 restarts?

enter image description here

-- KitKarson
kubernetes

1 Answer

3/19/2019

Take a look at the Pod Lifecycle docs:

Exited Containers that are restarted by the kubelet are restarted with an exponential back-off delay (10s, 20s, 40s …) capped at five minutes, and is reset after ten minutes of successful execution.

I think that there is no way to configure the back-off delay time.
EDIT: There is an open issue requesting this feature.

Also, note that using kubectl run you are not simulating "job restarts". Jobs are managed by Job Controllers, which behaves a little bit different when handling pod/containers errors, as it takes into account the combination of restartPolicy, parallelism, completions and the backoffLimit configs:

There are situations where you want to fail a Job after some amount of retries due to a logical error in configuration etc. To do so, set .spec.backoffLimit to specify the number of retries before considering a Job as failed. The back-off limit is set by default to 6. Failed Pods associated with the Job are recreated by the Job controller with an exponential back-off delay (10s, 20s, 40s …) capped at six minutes.

-- Eduardo Baitello
Source: StackOverflow