Creating a NodeJS data processing app with kubernetes and auto scaling

9/13/2019

I have a data processing app that ive been scaling via docker by creating servers manually and then running docker on them.

Here's how the architecture looks like

1 Tasker (tasker.example.com)

The tasker app finds data that needs to be processed and then hits the api with the context. The tasker also loops through 5-10 existing WORKER servers that are running and checks them if they have any tasks running. The tasker makes sure that the limit doesn't hit 100 tasks per server in order to maintain timely processing.

2 Worker (the data processing app)

Each worker keeps a count of current tasks being processed using this library https://github.com/samrith-s/concurrent-tasks

Lots of data types and their respective functions are defined in the worker.

Problem

This strategy is not scalable. It creates a huge mess later on and that's why im looking into a solution with kubernetes.

What I want to achieve

Kubernetes cluster that can :

  1. Expose an API where i can send all the tasks from my TASKER app - a single endpoint for all tasks to flood into
  2. Distribute the tasks among WORKER containers based on limits and their current queue
  3. Spin up new pods (WORKERS) if a job is not able to be queued
  4. Keep a log of which job was sent to which worker and it's success or failure.

I've been reading up on RabbitMQ and Celery so I'm familiar with those concepts.

Should i go with this kubernetes strategy or just a better queue system like Bull?

-- Ahmed
docker
kubernetes
node.js

1 Answer

9/14/2019

Kubernetes should work well for this.

Have the Tasker app find work to be done and produce task messages in RabbitMQ.

Set the worker app to consume the messages and do the work. Use a Kubernetes Horizontal Pod Autoscaler to scale the worker deployment based on the number of jobs that are queued up in RabbitMQ.

-- Collin Krawll
Source: StackOverflow