We have a Kubernetes cluster of web scraping cron jobs set up. All seems to go well until a cron job starts to fail (e.g., when a site structure changes and our scraper no longer works). It looks like every now and then a few failing cron jobs will continue to retry to the point it brings down our cluster. Running kubectl get cronjobs
(prior to a cluster failure) will show too many jobs running for a failing job.
I've attempted following the note described here regarding a known issue with the pod backoff failure policy; however, that does not seem to work.
Here is our config for reference:
apiVersion: batch/v1beta1
kind: CronJob
name: scrape-al
schedule: '*/15 * * * *'
concurrencyPolicy: Allow
failedJobsHistoryLimit: 0
successfulJobsHistoryLimit: 0
app: scrape
scrape: al
- name: scrape-al
image: 'govhawk/openstates:1.3.1-beta'
- /opt/openstates/openstates/pupa-scrape.sh
- al bills --scrape
restartPolicy: Never
backoffLimit: 3
Ideally we would prefer that a cron job would be terminated after N retries (e.g., something like kubectl delete cronjob my-cron-job
after my-cron-job
has failed 5 times). Any ideas or suggestions would be much appreciated. Thanks!
You can tell your Job to stop retrying using backoffLimit
Specifies the number of retries before marking this job failed.
In your case
- name: scrape-al
image: 'govhawk/openstates:1.3.1-beta'
- /opt/openstates/openstates/pupa-scrape.sh
- al bills --scrape
restartPolicy: Never
backoffLimit: 3
You set 3 asbackoffLimit
of your Job. That means when a Job is created by CronJob, It will retry 3 times if fails. This controls Job, not CronJob
When Job is failed, another Job will be created again as your scheduled period.
You want: If I am not wrong, you want to stop scheduling new Job, when your scheduled Jobs are failed for 5 times. Right?
Answer: In that case, this is not possible automatically.
Possible solution: You need to suspend CronJob so than it stop scheduling new Job.
Suspend: true
You can do this manually. If you do not want to do this manually, you need to setup a watcher, that will watch your CronJob status, and will update CronJob to suspend if necessary.