We have a service which is fairly idle most of the time, hence it would be great for us if we could delete all the pods when the service is not getting any request for say 30 minutes, and in the next time when a new request comes Kubernetes will create the first pod and process the response.
Is it possible to set the min pod instance count to 0?
I found that currently, Kubernetes does not support this, is there a way I can achieve this?
This is not supported in Kubernetes the way it's supported by web servers like nginx, apache or app engines like puma, passenger, gunicorn, unicorn or even Google App Engine Standard where they can be soft started and then brought up the moment the first request comes in with downside of this is that your first requests will always be slower. (There may have been some rationale behind Kubernetes pods not having to behave this way, and I can see a lot of design changes or having to create a new type of workload for this very specific case)
If a pod is sitting idle it would not be consuming that many resources. You could tweak the values of your pod resources for request/limit so that you request a small number of CPUs/Memory and you set the limit to a higher number of CPUs/Memory. The upside of having a pod always running is that in theory, your first requests will never have to wait a long time to get a response.
Yes. You can achieve that using Horizontal Pod Autoscale.
See example of Horizontal Pod Autoscale: Horizontal Pod Autoscaler Walkthrough