I need a scalable queue handling based on docker/python worker. My thought went towards kubernetes. However, I am unsure about the best controller/service.
Based on azure functions I get incoming http traffic adding simple messages to a storage queue. Those messages need to be worked on and the results fed back into a result queue.
To process those queue messages I developed python code looping the queue and working on those jobs. After each successful loop, the message will be removed from the source-queue and the result written into the result-queue. Once the queue is empty the code exists.
So I created a docker image that runs the python code. If more than one container is started the queue gets worked faster obviously. I also implemented the new Azure Kubernetes Services to scale that. While being new to kubernetes I read about the job paradigm to work a queue until the job is ready. My simple yaml template looks like this:
apiVersion: batch/v1
kind: Job
metadata:
name: myjob
spec:
parallelism: 4
template:
metadata:
name: myjob
spec:
containers:
- name: c
image: repo/image:tag
My problem now is, that the job cannot be restarted.
Usually, the queue gets filled with some entries and then for a while nothing happens. Then again bigger queues can arrive that need to be worked on as fast as possible. Of course, I want to run the job again then, but that seems not possible. Also, I want to reduce the footprint to a minimum if nothing is in the queue.
So my question is, what architecture/constructs should I use for this scenario and are there simple yaml examples for that?
This may be a "goofy/hacky" answer, but it's simple, robust, and I've been using it in a production system for months now.
I have a similar system where I have a queue that sometimes is emptied out and sometimes gets slammed. I wrote my queue processor similarly, it handles one message in the queue at a time and terminates if the queue is empty. It is set up to run in a Kubernetes job.
The trick is this: I created a CronJob to regularly start one single new instance of the job, and the job allows infinite parallelism. If the queue is empty, it immediately terminates ("scales down"). If the queue is slammed and the last job hadn't finished yet, another instance starts ("scales up").
No need to futz with querying the queue and scaling a statefulset or anything, and no resources are consumed if the queue is sitting empty. You may have to adjust the CronJob interval to fine tune how fast it reacts to the queue filling up, but it should react pretty well.
This is a common pattern, and there are several ways to architect a solution.
A common solution is to have an app with a set of workers always polling your queue (this could be your python script but you need to make it a service) and generally you'll want to use a Kubernetes Deployment possibly with an Horizontal Pod Autoscaler based on some metrics for your queue or CPU.
In your case, you'll want to make your script a daemon and poll the queue if there are any items (I assume you are already handling race conditions with parallelism). Then deploy this daemon using a Kubernetes deployment and then you can scale up and down based metrics or schedule.
There are already job schedulers out there for many different languages too. One that is very popular is Airflow that it already has the ability to have 'workers', but this may be overkill for a single python script.