What is the recommended architecture for scheduled jobs in Kubernetes cluster?

7/23/2018

What is the recommended architecture for scheduled jobs in Kuberntes cluster?

Consider following situation: You have some kind of job which you wish to run every ~ 24 hours and it takes around 2 hours to complete. Let it be for example a parser scraping info from some websites.

You want it to run in your Kuberntes cluster so you enclose it in Docker image.

The docker convention propose looking at container as at executable so you use your parser script as the default command in your Dockerfile:

CMD nodejs /src/parser.js

But now in Kuberntes when the parser finishes the container dies with it and will be restarted immediately.

Coming around this you can specify some other bash script as the CMD. This script will run indefinitely and will run your parser script every 24 hours. However this means you've lost this nice property fo your image and can't just do

docker run my-parser-image

So is there a way in Kuberntes to run some container every xx hours and if it fails to run it again? More broadly what is the proposed way of running scheduled containerized jobs in Kuberntes cluster?

-- Jen
docker
google-cloud-platform
kubernetes

1 Answer

7/23/2018

One way you can approach this is by creating a CronJob object in Kubernetes:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "0 */24 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: my-parser-cronjob
            image: my-parser-image
          restartPolicy: OnFailure

Similar to this is to use the object called Job, but keep in mind that Job runs only once till completion.

-- Luminance
Source: StackOverflow