How to handle the load spikes and queue the requests?

9/7/2016

Is there a configuration in Kubernetes in which I can specify minimum number of requests to be queued before a new instance gets spawned?

This is the context: We have got powerful high CPU machines set for our use case and every request levies a high amount of load on the server. Everything works perfect until we reach the specific number say... 300 requests with a ramp-up time of 100 milliseconds. And from that point we are receiving Connection refused error for some time and then the server starts to handle them once a new machine is spawned. What is the best way to handle the load spikes? I am looking for something like "Pending latency" config in the app engine. My application is deployed on Google compute engine and orchestrated by Kubernetes.

-- Rakesh Vidya Chandra
google-cloud-platform
google-compute-engine
kubernetes

1 Answer

9/8/2016

You can use readinessProbe (see container probes) to indicate the container is ready to service requests, and use HorizontalPodAutoscaler to automatically scale your apps up/down based on observed CPU utilization. Hope this helps.

-- janetkuo
Source: StackOverflow