Kubernetes Cronjob Only Runs Half the Time

5/8/2018

I want a job to trigger every 15 minutes but it is consistently triggering every 30 minutes.

UPDATE:

I've simplified the problem by just running:

kubectl run hello --schedule="*/1 * * * *" --restart=OnFailure --image=busybox -- /bin/sh -c "date; echo Hello from the Kubernetes cluster"

As specified in the docs here: https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/

and yet the job still refuses to run on time.

$ kubectl get cronjobs
NAME               SCHEDULE      SUSPEND   ACTIVE    LAST SCHEDULE   AGE
hello              */1 * * * *   False     1         5m              30m
hello2             */1 * * * *   False     1         5m              12m

It took 25 minutes for the command line created cronjob to run and 7 minutes for the cronjob created from yaml. They were both finally scheduled at the same time so it's almost like etcd finally woke up and did something?

ORIGINAL ISSUE:

When I drill into an active job I see Status: Terminated: Completed but Age: 25 minutes or something greater than 15.

In the logs I see that the python script meant to run has completed it's final print statement. The script takes about ~2min to complete based on it's output file in s3. Then no new job is scheduled for 28 more minutes.

I have tried with different configurations:

Schedule: */15 * * * * AND Schedule: 0,15,30,45 * * * *

As well as

Concurrency Policy: Forbid AND Concurrency Policy: Replace

What else could be going wrong here?

Full config with identifying lines modified:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  labels:
    type: f-c
  name: f-c-p
  namespace: extract
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 1
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      template:
        metadata:
          creationTimestamp: null
          labels:
            type: f-c
        spec:
          containers:
          - args:
            - /f_c.sh
            image: identifier.amazonaws.com/extract_transform:latest
            imagePullPolicy: Always
            env:
            - name: ENV
              value: prod
            - name: SLACK_TOKEN
              valueFrom:
                secretKeyRef:
                  key: slack_token
                  name: api-tokens
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  key: aws_access_key_id
                  name: api-tokens
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  key: aws_secret_access_key
                  name: api-tokens
            - name: F_ACCESS_TOKEN
              valueFrom:
                secretKeyRef:
                  key: f_access_token
                  name: api-tokens
            name: s-f-c
            resources: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: Never
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
  schedule: '*/15 * * * *'
  successfulJobsHistoryLimit: 1
  suspend: false
status: {}
-- ProGirlXOXO
kubernetes

2 Answers

5/16/2018

Isn't that by design?

A cron job creates a job object about once per execution time of its schedule. We say “about” because there are certain circumstances where two jobs might be created, or no job might be created. We attempt to make these rare, but do not completely prevent them. Therefore, jobs should be idempotent.

Ref. https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations

-- Viacheslav
Source: StackOverflow

5/22/2018

After running these jobs in a test cluster I discovered that external circumstances prevented them from running as intended.

On the original cluster there were ~20k scheduled jobs. The built-in scheduler for Kubernetes is not yet capable of handling this volume consistently.

The maximum number of jobs that can be reliably run (within a minute of the time intended) may depend on the size of your master nodes.

-- ProGirlXOXO
Source: StackOverflow