Handle sudden increase in traffic size (multiple orders of magnitude) with GKE

9/29/2020

If a website has a door crasher sale where many people (~50K) are waiting for the countdown to finish and enter the page, how would one tackle this with GKE in a cost efficient way?

That seems to be the reason GKE exists, the solution could be that with cluster autoscaler and HPA, GKE can handle the traffic. In practice however it is a different story, when the autoscaler tries to create nodes and pull the image for containers it may take up to a certain time (perhaps up to a min or two in some cases). During that time users see 5XX errors which is not ideal.

Well to tackle that, over-provisioning with paused pods come to mind. However, considering the servers are generally very small in size (they should only handle 100 requests in a normal day) and all of a sudden 50K in a second, how would this be a feasible solution? Paused pods seems to only make sure the autoscaler don't remove nodes that are not working, so in that case 50 nodes must always be occupied with paused pods which I am assuming the running hours are still billable (since nodes are there just not doing anything) in GKE.

What would a feasible solution to serve 100 requests with n1-standard-1 everyday but also be able to scale to ~50k in less than 10 seconds?

-- Greg
google-kubernetes-engine
kubernetes
scalability
serverless

1 Answer

9/29/2020

Not as fast as 10 seconds. That's reachable only if you go serverless.

Pods autoscaling best is 20-30 seconds (depends on your readiness probes, probes of loadbalancer, image cache etc). But you still have to have a pool of nodes to fit that capacity, which is the same money - you're right.

Nodes+Pods autoscaling is around 5 minutes.

If you go serverless, make sure you know (increase?) your account limits. Because it scales so fast and billed per lambda-run - it was very easy to accidentally blow up your bill. Thus all providers limited the default amount of concurrent function executions, e.g. AWS has 1000 per account by default. https://aws.amazon.com/about-aws/whats-new/2017/05/aws-lambda-raises-default-concurrent-execution-limit/. This can be increased through support.

I recall this post for AWS: https://aws.amazon.com/blogs/startups/from-0-to-100-k-in-seconds-instant-scale-with-aws-lambda/. Unfortunately didn't see similar writes for google functions, but I'm sure they have very similar capabilities.

-- Max Lobur
Source: StackOverflow