On kubernetes a Container restart policy can be configured with an optional field .spec.restartPolicy
which can be set to type: OnFailure
. I read here
that there is a cap of 300 secs (5 mins) on an exponential back-off delay before a failed pod is restarted. My first confusion, does this cap of 300 secs apply to only the default configuration or does it affect for example, the below configuration. Also, I am wondering if increasing the number of retries for example onFailureRetries:6
with an interval of onFailureRetryInterval:9
(considering the 300 sec cap) makes sense considering pressure on the resources on the cluster? Is there a resource available to help know which best configuration to use or will this be based on user experience, or I just have to try and see what makes sense for my cluster?
restartPolicy:
type: OnFailure
onFailureRetries: 3
onFailureRetryInterval: 10
onSubmissionFailureRetries: 5
onSubmissionFailureRetryInterval: 20