Have kube jobs start on waiting pods

12/11/2018

I am working on a scenario where I want to be able to maintain some X number of pods in waiting (and managed by kube) and then upon user request (via some external system) have a kube job start on one of those waiting pods. So now the waiting pods count is X-1 and kube starts another pod to bring this number back to X. This way I'll be able to cut down on the time taken to create a pod, start a container and getting is ready to start actual processing. The processing data can be sent to those pods via some sort of messaging (akka or rabbitmq). I think the ReplicationControllers are best place to keep idle pods, but when I create a job how can I specify that I want to be able to use one of the pods that are in waiting and are managed by ReplicationController.

-- Mayank
kubernetes
kubernetes-jobs

1 Answer

12/12/2018

I think I got this to work upto a state on top of which I can build this solution.
So what I am doing is starting a RC with replicas: X (X is the number of idle pods I wish to maintain, usually not a very large number). The pods that it starts have custom label status: idle or something like that. The RC spec.selector has the same custom label value to match with the pods that it manages, so spec.selector.status: idle. When creating this RC, kube ensures that it creates X pods with their status=idle. Somewhat like below:
apiVersion: v1 kind: ReplicationController metadata: name: testrc spec: replicas: 3 selector: status: idle template: metadata: name: idlepod labels: status: idle spec: containers: ... On the other hand I have a job yaml that has spec.manualSelector: true (and yes I have taken into account that the label set has to be unique). With manualSelector enabled, I can now define selectors on the job like below.
apiVersion: batch/v1 kind: Job metadata: generateName: testjob- spec: manualSelector: true selector: matchLabels: status: active ...

So clearly, RC creates pods with status=idle and job expects to use pods with status=active because of the selector.
So now whenever I have a request to start a new job, I'll update label on one of the pods managed by RC so that its status=active. The selector on RC will effect the release of this pod from its control and start another one because of replicas: X set on it. And the released pod is no longer controller by RC and is now orphan. Finally, when I create a job, the selector on this job template will match the label of the orphaned pod and this pod will then be controlled by the new job. I'll send messages to this pod that will start the actual processing and finally bring it to complete.

P.S.: Pardon my formatting. I am new here.

-- Mayank
Source: StackOverflow