I have a data processing app that ive been scaling via docker by creating servers manually and then running docker on them.
Here's how the architecture looks like
The tasker app finds data that needs to be processed and then hits the api with the context. The tasker also loops through 5-10 existing WORKER servers that are running and checks them if they have any tasks running. The tasker makes sure that the limit doesn't hit 100 tasks per server in order to maintain timely processing.
Each worker keeps a count of current tasks being processed using this library https://github.com/samrith-s/concurrent-tasks
Lots of data types and their respective functions are defined in the worker.
This strategy is not scalable. It creates a huge mess later on and that's why im looking into a solution with kubernetes.
Kubernetes cluster that can :
I've been reading up on RabbitMQ and Celery so I'm familiar with those concepts.
Should i go with this kubernetes strategy or just a better queue system like Bull?
Kubernetes should work well for this.
Have the Tasker app find work to be done and produce task messages in RabbitMQ.
Set the worker app to consume the messages and do the work. Use a Kubernetes Horizontal Pod Autoscaler to scale the worker deployment based on the number of jobs that are queued up in RabbitMQ.