Kubernetes pods N:M scheduling how-to

7/29/2015

Batch computations, Monte Carlo, using Docker image, multiple jobs running on Google cloud and managed by Kubernetes. No Replication Controllers, just multiple pods with NoRestart policy delivering computed payloads to our server. So far so good. Problem is, I have cluster with N nodes/minions, and have M jobs to compute, where M > N. So I would like to fire M pods at once and tell Kubernetes to schedule it in such a way so that only N are running at a given time, and everything else is kept in Pending state. As soon as one pod is done, next is scheduled to run moving from Pending to Running and so on and so forth till all M pods are done.

Is it possible to do so?

-- Severin Pappadeux
kubernetes

2 Answers

7/29/2015

Yes, you can have them all ask for a resource of which there's only one on each node, then the scheduler won't be able to schedule more than N at a time. The most common way to do this is to have each pod ask for a hostPort in the ports section of its containers spec.

However, I can't say I'm completely sure why you would want to limit the system to one such pod per node. If there are enough resources available to run multiple at a time on each node, it should speed up your job to let them run.

-- Alex Robinson
Source: StackOverflow

7/30/2015

Just for the record, after discussion with Alex, trial and error and a binary search for a good number, what worked for me was setting the CPU resource limit in the Pod JSON to:

    "resources": {
        "limits": {
            "cpu": "490m"
        }
    }

I have no idea how and why this particular value influences the Kubernetes scheduler, but it keeps nodes churning through the jobs, with exactly one pod per node running at any given moment.

-- Severin Pappadeux
Source: StackOverflow