Kubernetes Not Scheduling CronJob

1/12/2022

I'm running an instance of microk8s and attempting to get a CronJob running every 60 seconds, but it's simply not working. It's my understanding CronJob's shouldn't need any manual intervention to kick them off, but this system has been up for over a month and I didn't see the pod for the cron job (in any state), so I decided I'd try kicking it off manually with k create job --from=cronjob/health-status-cron health-status-cron. After manually kicking it off, the job completed successfully:

health-status-cron-2hh96                   0/1     Completed   0          17h

I was hoping Kubernetes would then start scheduling future jobs, but it didn't. Following is my manifest (some of it is templated with Helm, but that shouldn't matter):

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: health-status-cron
  namespace: {{ .Values.global.namespace }}
  labels:
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/release-name: {{ .Release.Name }}
    app.kubernetes.io/release-namespace: {{ .Release.Namespace }}
spec:
  schedule: "* * * * *"
  concurrencyPolicy: Replace
  successfulJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: health-status-cron
            image: busybox
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - /usr/bin/curl -k http://restfulservices/api/system-health
          restartPolicy: OnFailure

Also of note, according to the following, the job hasn't been scheduled for 35 days:

$ k -ntango get cronjobs
NAME                   SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
health-status-cron     * * * * *   False     0        35d             36d

At this point, I have absolutely no clue what I'm doing wrong or why this particular job isn't running. Any help is greatly appreciated.

Edit: I ended up blowing away the entire namespace and redeploying. I still don't know the underlying cause, unfortunately, but everything seems to work now.

-- senfo
cron
kubernetes
microk8s

1 Answer

1/12/2022

A couple of other things you can check:

  1. Do you have any cron pods with a "failed" status? If you do, check those pods for why.
  2. Did it used to work and then suddenly stop?
  3. Does the cronjob resource have anything in the events? kubectl describe cronjob health-status-cron -n tango
  4. Does the code your cron runs take > 1 minute to complete? If so, your schedule is too aggressive, and you might want to loosen the schedule
  5. The cronjob controller also has some limitations you may want to check: https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations. Specifically the concept of "missed jobs". If the cronjob controller "misses" scheduling 100 or more jobs, it will "freeze" the job and not schedule it anymore. Do you scale down the cluster or similar when it is not in use?
  6. Do you have any custom/third-party webhooks or plugins installed in the cluster? These can interfere with pod creation.
  7. Do you have any jobs created in the namespace? kubectl get jobs -n tango If you find a ton of job objects, check them to see why they did not generate pods.

I encountered a somewhat similar issue in 2020 (writeup has a link to the issue I raised in the Kubernetes project itself): https://blenderfox.com/2020/08/07/the-snowball-effect-in-kubernetes/

-- Blender Fox
Source: StackOverflow