Is NoRestart policy supported for Kubernetes pods?

7/28/2015

Batch computations, Monte Carlo, using Docker image, multiple jobs running on Google cloud and managed by Kubernetes. But it (replication controller, I guess?) managed to restart same computation again and again due to default restart policy.

Is there a way now to let pods die? Or maybe other workarounds to do pods garbage collection?

-- Severin Pappadeux
docker
google-compute-engine
kubernetes

2 Answers

7/28/2015

Now that v1.0 is out, better native support for getting the batch computations is one of the team's top priorities, but it is already quite possible to run them.

If you run something as a pod rather than as a replication controller, you can set the restartPolicy field on it. The OnFailure policy is probably what you'd want, meaning that kubernetes will restart a pod that exited with a non-zero exit code, but won't restart a pod that exited zero.

If you're using kubectl run to start your pods, though, I'm unfortunately not aware of a way to have it create just a pod rather than a replication controller. If you'd like something like that, it'd be great if you opened an issue requesting it as an option.

-- Alex Robinson
Source: StackOverflow

12/2/2015

As of November 2015, kubernetes v1.1.1 now provides a jobs api https://github.com/kubernetes/kubernetes/blob/master/docs/user-guide/jobs.md

The following is a simple job that executes the date command once per second for 60secs:

$ cat job.yaml
apiVersion: extensions/v1beta1
kind: Job
metadata:
  name: example
spec:
  selector:
    matchLabels:
      app: example
  template:
    metadata:
      name: example
      labels:
        app: example
    spec:
      containers:
      - name: example
        image: debian
        command: ["timeout",  "60", "bash", "-c", "while sleep 1; do date;done"]
      restartPolicy: Never

Run the job on your kubernetes cluster:

$ cluster/kubectl.sh create -f job.yaml
job "example" created

Retrieve the pod id:

$ cluster/kubectl.sh get pods
NAME            READY     STATUS              RESTARTS   AGE
example-3nxin   1/1       Running             0          15s

Now check the logs for the pod:

$ cluster/kubectl.sh logs example-3nxin
Sat Dec  5 04:47:12 UTC 2015
Sat Dec  5 04:47:13 UTC 2015
Sat Dec  5 04:47:14 UTC 2015
Sat Dec  5 04:47:15 UTC 2015
Sat Dec  5 04:47:16 UTC 2015
Sat Dec  5 04:47:17 UTC 2015
Sat Dec  5 04:47:18 UTC 2015
Sat Dec  5 04:47:19 UTC 2015
Sat Dec  5 04:47:20 UTC 2015
Sat Dec  5 04:47:21 UTC 2015
Sat Dec  5 04:47:22 UTC 2015
Sat Dec  5 04:47:23 UTC 2015
Sat Dec  5 04:47:24 UTC 2015
Sat Dec  5 04:47:25 UTC 2015
Sat Dec  5 04:47:26 UTC 2015
Sat Dec  5 04:47:27 UTC 2015
Sat Dec  5 04:47:28 UTC 2015
Sat Dec  5 04:47:29 UTC 2015

Optionally you can set the restartPolicy to OnFailure, so that if the job exits with a non-zero exit status, it is restarted.

-- peteridah
Source: StackOverflow