Is there a 'max-retries' for Kubernetes Jobs?

2/1/2016

I have batch jobs that I want to run on Kubernetes. The way I understand Jobs:

If I choose restartPolicy: Never it means that if the Job fails, it will destroy the Pod and reschedule onto (potentially) another node. If restartPolicy: OnFailure, it will restart the container in the existing Pod. I'd consider a certain number of failures unrecoverable. Is there a way I can prevent it from rescheduling or restarting after a certain period of time and cleanup the unrecoverable Jobs?

My current thought for a workaround to this is to have some watchdog process that looks at retryTimes and cleans up Jobs after a specified number of retries.

-- alph486
google-compute-engine
kubernetes

2 Answers

2/2/2016

Summary of slack discussion:

No, there is no retry limit. However, you can set a deadline on the job as of v1.2 with activeDeadlineSeconds. The system should back off restarts and then terminate the job when it hits the deadline.

-- briangrant
Source: StackOverflow

7/25/2018
-- Dave Koston
Source: StackOverflow