How to respond with 503 error code in Kubernetes load balancer

5/17/2018

I have an Google Cloud Load Balancer-backed ingress in my Google Kubernetes Engine cluster. I have an autoscaler set up to scale the number of replicas of my deployment based on CPU usage. Let's say I have set the CPU threshold to 50%.

When there is a burst of requests, the CPU usage goes to 100%. The autoscaler takes a few minutes to realize the high load, create more pods, create new nodes if necessary, and pass health checks. During this scaling period, some or the majority of requests fail with the 502 error due to timeouts. I would rather return a 503 error code immediately if the server is under heavy load instead of returning a 502 error code after the 30 second timeout.

Is it possible to have the load balancer direct traffic to pods with the lowest CPU usage? Is is possible to return a 503 error code if none of the pods have a CPU usage below a certain threshold, say 80%?

What is standard practice for handling a large burst of traffic, and how should I go about resolving this issue in Kubernetes?

-- Akash Krishnan
google-cloud-platform
kubernetes
load-balancing

1 Answer

5/18/2018

First problem you are describing (serving 503) is called "load shedding". Normally it's a responsibility of the application to say: "oops, I'm overloaded, 503, slow down". If you move this responsibility to the client, then it might be too slow to react to provide you any reasonable protection - its data will always be behind. From the system reliability point of view, it's better to keep this logic in the server application.

The second problem is CPU-aware load balancing. One possible approach to this problem is called weighted round-robin - it's like regular round-robin, but preferring less loaded nodes. If you install istio in Kubernetes, you can select from a list of load balancing policies. One of them is weighted least request - it relies on the number of requests in flight, not directly on CPU, but if all your requests have about the same CPU cost, it might be a good proxy to CPU load.

-- Alexandr Lurye
Source: StackOverflow