I'm writing a backend app using nodejs
which execute a lot of http
requests to external services and s3
.
I have reached to roughly 800 requests per second on a single kubernetes pod.
The pod is limited to a single vcpu and it has reached to 100% usage.
I can scale it to tens of pods to handle the execution of thousands of requests,
but it seems that this limit has reached too soon.
I have tested it in my real backend app and then on a demo pod which does nothing but to send http
request using axios
.
Does it make sense that a single vcpu kubernetes pod can only handle 800 req / sec? (as client and not as a server).
It's quite hard to propose any advice for the best approach with choosing a proper capacity for the compute resources affordable to your specific needs. However, when you use 1x vCPU
in Pod limit requests it equivalents 1 CPU unit for most widely used Cloud providers VM resources.
Thus, I would bet here for adding more CPU units into your Pod than spinning more Pods with a same number of vCPU
by Kubernetes scheduler using HPA (Horizontal Pod Autoscaler) feature. Therefore, if you don't have enough capacity on your node, it's very easy to push lots of Pod to be overloaded; and indeed this would not give positive influence on Node compute engine.
In your example, there are two key metric parameters to analyze: latency (time for sending requests and receiving answer) and throughput (requests per second) of HTTP
requests; here is always the rule on the top: Increasing the latency will decrease the overall throughput for your requests.
You can also read about Vertical Pod Autoscaler as an option for managing compute resources in Kubernetes cluster.