I want a job to trigger every 15 minutes but it is consistently triggering every 30 minutes.
UPDATE:
I've simplified the problem by just running:
kubectl run hello --schedule="*/1 * * * *" --restart=OnFailure --image=busybox -- /bin/sh -c "date; echo Hello from the Kubernetes cluster"
As specified in the docs here: https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/
and yet the job still refuses to run on time.
$ kubectl get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
hello */1 * * * * False 1 5m 30m
hello2 */1 * * * * False 1 5m 12m
It took 25 minutes for the command line created cronjob to run and 7 minutes for the cronjob created from yaml. They were both finally scheduled at the same time so it's almost like etcd finally woke up and did something?
ORIGINAL ISSUE:
When I drill into an active job I see Status: Terminated: Completed
but Age: 25 minutes
or something greater than 15.
In the logs I see that the python script meant to run has completed it's final print statement. The script takes about ~2min to complete based on it's output file in s3. Then no new job is scheduled for 28 more minutes.
I have tried with different configurations:
Schedule: */15 * * * *
AND Schedule: 0,15,30,45 * * * *
As well as
Concurrency Policy: Forbid
AND Concurrency Policy: Replace
What else could be going wrong here?
Full config with identifying lines modified:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
labels:
type: f-c
name: f-c-p
namespace: extract
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
spec:
template:
metadata:
creationTimestamp: null
labels:
type: f-c
spec:
containers:
- args:
- /f_c.sh
image: identifier.amazonaws.com/extract_transform:latest
imagePullPolicy: Always
env:
- name: ENV
value: prod
- name: SLACK_TOKEN
valueFrom:
secretKeyRef:
key: slack_token
name: api-tokens
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: aws_access_key_id
name: api-tokens
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: aws_secret_access_key
name: api-tokens
- name: F_ACCESS_TOKEN
valueFrom:
secretKeyRef:
key: f_access_token
name: api-tokens
name: s-f-c
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
schedule: '*/15 * * * *'
successfulJobsHistoryLimit: 1
suspend: false
status: {}
Isn't that by design?
A cron job creates a job object about once per execution time of its schedule. We say “about” because there are certain circumstances where two jobs might be created, or no job might be created. We attempt to make these rare, but do not completely prevent them. Therefore, jobs should be idempotent.
Ref. https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations
After running these jobs in a test cluster I discovered that external circumstances prevented them from running as intended.
On the original cluster there were ~20k scheduled jobs. The built-in scheduler for Kubernetes is not yet capable of handling this volume consistently.
The maximum number of jobs that can be reliably run (within a minute of the time intended) may depend on the size of your master nodes.