I am going to deploy a Python Flask Server with Docker on Kubernetes using Gunicorn and Gevent/Eventlet as asynchronous workers. The application will:
Each topic in Kafka will receive 1 message per minute, so the application needs to consume around 20 messages per minute from Kafka. For each message, the handling and execution take around 45 seconds. The question is how I can scale this in a good way? I know that I can add multiple workers in Gunicorn and use multiple replicas of the pod when I deploy to Kubernetes. But is that enough? Will the workload be automatically balanced between the available workers in the different pods? Or what can I do to ensure scalability?
I recommend you set up an HPA Horizontal Pod Autoscaler for your workers.
It will require to set up support for the metrics API. For custom metrics on the later versions of Kubernetes heapster has been deprecated in favor of the metrics server
If you are using the public Cloud like AWS, GCP, or Azure I'd also recommend setting up an Autoscaling Group so that you can scale your VMs or server base on metrics like CPU utilization average.
Hope it helps!