I am new to containers and Kubernetes. I am doing most of my testing in Azure.
I created a container and based on the load the container horizontally scale. The endpoint hosted in the container I built will have to deal with burst of requests. I expect the container to be most of the time sitting idle but, in my testing, it sometime have to scale out to multiple instances to handle the burst of requests.
In order to keep the cost of running the service down, I wonder if it is possible to scale down the number of instances of the container to 0 while keeping the ability to "wake-up" the container when requests are about to come in? How could I achieve that?
There have been numerous proposals related to this functionality, but none have been incorporated into kubernetes.
Idle/Un-idle proposals:
It is a difficult problem since the incoming requests would need to be queued while the service is spun-up.
If you're interested in implementing this functionality, it looks like some of the work has been done for you already, however it is not necessarily an end-to-end solution for your use case: https://github.com/openshift/service-idler
Here is another project which addresses similar capability: https://github.com/deislabs/osiris