How do I stop a CronJob from recreating failed Jobs?

6/5/2019

When for whatever reasons I delete the pod running the Job that was started by a CronJob, I immediately see a new pod being created. It is only once I delete something like six times the backoffLimit number of pods, that new ones stop being created.

Of course, if I'm actively monitoring the process, I can delete the CronJob, but what if the Pod inside the job fails when I'm not looking? I would like it not to be recreated.

How can I stop the CronJob from persisting in creating new jobs (or pods?), and wait until the next scheduled time if the current job/pod failed? Is there something similar to Jobs' backoffLimit, but for CronJobs?

-- LLlAMnYP
kubernetes
kubernetes-cronjob

1 Answer

6/6/2019

Set startingDeadlineSeconds to a large value or left unset (the default).

At the same time set .spec.concurrencyPolicy as Forbid and the CronJobs skips the new job run while previous created job is still running.

If startingDeadlineSeconds is set to a large value or left unset (the default) and if concurrencyPolicy is set to Forbid, the job will not be run if failed.

Concurrent policy field you can add to specification to defintion of your CronJob (.spec.concurrencyPolicy), but this is optional.

It specifies how to treat concurrent executions of a job that is created by this CronJob. The spec may specify only one of these three concurrency policies:

  • Allow (default) - The cron job allows concurrently running jobs
  • Forbid - The cron job does not allow concurrent runs; if it is time for a new job run and the previous job run hasn’t finished yet, the cron job skips the new job run
  • Replace - If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run

It is good to know that currency policy applies just to the jobs created by the same CronJob. If there are multiple CronJobs, their respective jobs are always allowed to run concurrently.

A CronJob is counted as missed if it has failed to be created at its scheduled time. For example, If concurrencyPolicy is set to Forbid and a CronJob was attempted to be scheduled when there was a previous schedule still running, then it would count as missed.

For every CronJob, the CronJob controller checks how many schedules it missed in the duration from its last scheduled time until now. If there are more than 100 missed schedules, then it does not start the job and logs the error

More information you can find here: CronJobs and AutomatedTask.

I hope it helps.

-- MaggieO
Source: StackOverflow