How is Python scaling with Gunicorn and Kubernetes?

9/22/2018

I am going to deploy a Python Flask Server with Docker on Kubernetes using Gunicorn and Gevent/Eventlet as asynchronous workers. The application will:

  1. Subscribe to around 20 different topics on Apache Kafka.
  2. Score some machine learning models with that data.
  3. Upload the results to a relational database.

Each topic in Kafka will receive 1 message per minute, so the application needs to consume around 20 messages per minute from Kafka. For each message, the handling and execution take around 45 seconds. The question is how I can scale this in a good way? I know that I can add multiple workers in Gunicorn and use multiple replicas of the pod when I deploy to Kubernetes. But is that enough? Will the workload be automatically balanced between the available workers in the different pods? Or what can I do to ensure scalability?

-- danielo
docker
flask
gunicorn
kubernetes
python

1 Answer

9/22/2018

I recommend you set up an HPA Horizontal Pod Autoscaler for your workers.

It will require to set up support for the metrics API. For custom metrics on the later versions of Kubernetes heapster has been deprecated in favor of the metrics server

If you are using the public Cloud like AWS, GCP, or Azure I'd also recommend setting up an Autoscaling Group so that you can scale your VMs or server base on metrics like CPU utilization average.

Hope it helps!

-- Rico
Source: StackOverflow