How to automatically scale number of pod based on load?

11/16/2018

We have a service which is fairly idle most of the time, hence it would be great for us if we could delete all the pods when the service is not getting any request for say 30 minutes, and in the next time when a new request comes Kubernetes will create the first pod and process the response.

Is it possible to set the min pod instance count to 0?

I found that currently, Kubernetes does not support this, is there a way I can achieve this?

-- Gautam Moulik
kubernetes
kubernetes-helm

2 Answers

11/16/2018

This is not supported in Kubernetes the way it's supported by web servers like nginx, apache or app engines like puma, passenger, gunicorn, unicorn or even Google App Engine Standard where they can be soft started and then brought up the moment the first request comes in with downside of this is that your first requests will always be slower. (There may have been some rationale behind Kubernetes pods not having to behave this way, and I can see a lot of design changes or having to create a new type of workload for this very specific case)

If a pod is sitting idle it would not be consuming that many resources. You could tweak the values of your pod resources for request/limit so that you request a small number of CPUs/Memory and you set the limit to a higher number of CPUs/Memory. The upside of having a pod always running is that in theory, your first requests will never have to wait a long time to get a response.

-- Rico
Source: StackOverflow

11/16/2018

Yes. You can achieve that using Horizontal Pod Autoscale.

See example of Horizontal Pod Autoscale: Horizontal Pod Autoscaler Walkthrough

-- Emruz Hossain
Source: StackOverflow