I have deployed some simple services as a proof of concept: an nginx web server patched with https://stackoverflow.com/a/8217856/735231 for high performance.
I also edited /etc/nginx/conf.d/default.conf
so that the line listen 80;
becomes listen 80 http2;
.
I am using the Locust distributed load-testing tool, with a class that swaps the requests
module for hyper
in order to test HTTP/2 workloads. This may not be optimal in terms of performance, but I can spawn many locust workers, so it's not a huge concern.
For testing, I spawned a cluster on GKE of 5 machines, 2 vCPU, 4GB RAM each, installed Helm and the charts of these services (I can post them on a gist later if useful).
I tested Locust with min_time=0 and max_time=0 so that it spawned as many requests as possible; with 10 workers against a single nginx instance.
With 10 workers, 140 "clients" total, I get ~2.1k requests per second (RPS).
10 workers, 260 clients: I get ~2.0k RPS
10 workers, 400 clients: ~2.0k RPS
Now, I try to scale horizontally: I spawn 5 nginx instances and get:
10 workers, 140 clients: ~2.1k RPS
10 workers, 280 clients: ~2.1k RPS
20 workers, 140 clients: ~1.7k RPS
20 workers, 280 clients: ~1.9k RPS
20 workers, 400 clients: ~1.9k RPS
The resouce usage is quite low as portrayed by kubectl top pod
(this is for 10 workers, 280 clients; nginx is not resource-limited, locust workers are limited to 1 CPU per pod):
user@cloudshell:~ (project)$ kubectl top pod
NAME CPU(cores) MEMORY(bytes)
h2test-nginx-cc4d4c69f-4j267 34m 68Mi
h2test-nginx-cc4d4c69f-4t6k7 27m 68Mi
h2test-nginx-cc4d4c69f-l942r 30m 69Mi
h2test-nginx-cc4d4c69f-mfxf8 32m 68Mi
h2test-nginx-cc4d4c69f-p2jgs 45m 68Mi
lt-master-5f495d866c-k9tw2 3m 26Mi
lt-worker-6d8d87d6f6-cjldn 524m 32Mi
lt-worker-6d8d87d6f6-hcchj 518m 33Mi
lt-worker-6d8d87d6f6-hnq7l 500m 33Mi
lt-worker-6d8d87d6f6-kf9lj 403m 33Mi
lt-worker-6d8d87d6f6-kh7wt 438m 33Mi
lt-worker-6d8d87d6f6-lvt6j 559m 33Mi
lt-worker-6d8d87d6f6-sxxxm 503m 34Mi
lt-worker-6d8d87d6f6-xhmbj 500m 33Mi
lt-worker-6d8d87d6f6-zbq9v 431m 32Mi
lt-worker-6d8d87d6f6-zr85c 480m 33Mi
I portrayed this test on GKE for easier replication, but I have come to the same results in a private-cloud cluster.
Why does it seem that it does not matter how many instances I spawn of a service?
UPDATE: As per the first answer, I'm updating information with information on the nodes and on what happens with a single Locust worker.
1 worker, 1 clients: 22 RPS
1 worker, 2 clients: 45 RPS
1 worker, 4 clients: 90 RPS
1 worker, 8 clients: 174 RPS
1 worker, 16 clients: 360 RPS
32 clients: 490 RPS
40 clients: 480 RPS (this seems over max. sustainable clients per worker)
But above all, it seems that the root problem is that I'm at the limit of capacity:
user@cloudshell:~ (project)$ kubectl top pod
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
gke-sc1-default-pool-cbbb35bb-0mk4 1903m 98% 695Mi 24%
gke-sc1-default-pool-cbbb35bb-9zgl 2017m 104% 727Mi 25%
gke-sc1-default-pool-cbbb35bb-b02k 1991m 103% 854Mi 30%
gke-sc1-default-pool-cbbb35bb-mmcs 2014m 104% 776Mi 27%
gke-sc1-default-pool-cbbb35bb-t6ch 1109m 57% 743Mi 26%
If I understood correctly, you did run the load testing on same cluster/nodes as your pods, this will definitely have an impact on the overall result, I would recommend you split the client from the server on separate nodes so that one does not affect each other.
For the values you reported, is clearly visible that the workers are consuming more CPU that the nginx servers.
You should check either:
You should also test individual server capacity to validate the limit of each server, so you have a parameter to compare if the results are in line with the expected values.