How to use job queue in kubernetes cluster

12/26/2019

I have a Flask app where I can upload a file and call a Flask API to process that file using the Python subprocess module (which runs a Shell command).

How to make it work with Kubernetes so that file processing can work with the cluster?

Suppose, I have 2 nodes and each node can process 3 files at a time. so, if I have 10 pending processing. 6 of them will be done on the 2 nodes and 4 will be in the queue.

I have seen Kubernetes autoscaling but it seems, it will fire as many nodes as needed. If I have a fixed number of nodes and I call my Flask API multiple times, it will be out of resources.

--
kubernetes
python

1 Answer

12/27/2019

if I call this API 20 times, at the same time, all clusters will be out of resources. If all clusters can run 6 pods at a time (this number can change based on the cluster size), before reaching 90% or 100% CPU/memory usage, the remaining 14 pods will the in the queue.

You can create Kubernetes Jobs and set the proper value for parallelism and completions e.g. parallelism: 6 to have up to 6 pods parallel of this job and completions: 20 if you have 20 items to process.

Problems

But there is several problems with using Kubernetes Jobs for this. First, there is no defined way to allocate input-files to the job-pods, so the job-pods need to have logic for that. And if you don't want to start all 20 jobs at the same time, you better should use a queue service to handle a work queue. A queue could be handled by e.g. Kafka or Nats.

-- Jonas
Source: StackOverflow