Scaling an API on Kubernetes-based application

12/19/2018

I am building an API and using Kubernetes as the cloud based orchestrator. For reference, I am using Spring and Tomcat for my microservices.

I am looking to automatically scale using HPA and VPA. However, scaling takes some time to react to the load and can inaccurate depending on configuration.

My question is, if someone makes a request to my API and Kubernetes needs to spawn a pod, then will the client to my API have to wait these 30 seconds in response time? How can I approach this behaviour elegantly?

-- Zeruno
kubernetes
microservices
spring

1 Answer

12/19/2018

You can run your api servers behind a Kubernetes service. Kubernetes service creates a load balancer(and an endpoint for it) and will use round robin by default to distribute requests among web servers.

When CPU usage starts to increase, I guess request latency will be affected, you might see increased response times because of increased load.

The request shouldn't be waiting for a new pod to spin up. Because the pod hasn't registered itself with the load balancer. Load balancer won't be aware about the pod until its actually ready to serve the request. (You can check LivenessProbe also)

I guess unless the request isn't queued up at the load balancer for a long time(greater than the time taken to spin up new pod), it shouldn't go to new pod.

You can tune thresholds over time to leave some buffer CPU for handling spikes in traffic(also considering time taken to spin up a new pod) and then start adding new pods if it does not decrease.

You can also look into shutting downs pods when load decreases. (Not sure if this is possible).

-- Ankit Deshpande
Source: StackOverflow