Why can't I scale horizontally a simple HTTP/2 service on Kubernetes?

2/13/2019

I have deployed some simple services as a proof of concept: an nginx web server patched with https://stackoverflow.com/a/8217856/735231 for high performance.

I also edited /etc/nginx/conf.d/default.conf so that the line listen 80; becomes listen 80 http2;.

I am using the Locust distributed load-testing tool, with a class that swaps the requests module for hyper in order to test HTTP/2 workloads. This may not be optimal in terms of performance, but I can spawn many locust workers, so it's not a huge concern.

For testing, I spawned a cluster on GKE of 5 machines, 2 vCPU, 4GB RAM each, installed Helm and the charts of these services (I can post them on a gist later if useful).

I tested Locust with min_time=0 and max_time=0 so that it spawned as many requests as possible; with 10 workers against a single nginx instance.

With 10 workers, 140 "clients" total, I get ~2.1k requests per second (RPS).

10 workers, 260 clients: I get ~2.0k RPS
10 workers, 400 clients: ~2.0k RPS

Now, I try to scale horizontally: I spawn 5 nginx instances and get:

10 workers, 140 clients: ~2.1k RPS
10 workers, 280 clients: ~2.1k RPS
20 workers, 140 clients: ~1.7k RPS
20 workers, 280 clients: ~1.9k RPS
20 workers, 400 clients: ~1.9k RPS

The resouce usage is quite low as portrayed by kubectl top pod (this is for 10 workers, 280 clients; nginx is not resource-limited, locust workers are limited to 1 CPU per pod):

user@cloudshell:~ (project)$ kubectl top pod
NAME                           CPU(cores)   MEMORY(bytes)
h2test-nginx-cc4d4c69f-4j267   34m          68Mi
h2test-nginx-cc4d4c69f-4t6k7   27m          68Mi
h2test-nginx-cc4d4c69f-l942r   30m          69Mi
h2test-nginx-cc4d4c69f-mfxf8   32m          68Mi
h2test-nginx-cc4d4c69f-p2jgs   45m          68Mi
lt-master-5f495d866c-k9tw2     3m           26Mi
lt-worker-6d8d87d6f6-cjldn     524m         32Mi
lt-worker-6d8d87d6f6-hcchj     518m         33Mi
lt-worker-6d8d87d6f6-hnq7l     500m         33Mi
lt-worker-6d8d87d6f6-kf9lj     403m         33Mi
lt-worker-6d8d87d6f6-kh7wt     438m         33Mi
lt-worker-6d8d87d6f6-lvt6j     559m         33Mi
lt-worker-6d8d87d6f6-sxxxm     503m         34Mi
lt-worker-6d8d87d6f6-xhmbj     500m         33Mi
lt-worker-6d8d87d6f6-zbq9v     431m         32Mi
lt-worker-6d8d87d6f6-zr85c     480m         33Mi

I portrayed this test on GKE for easier replication, but I have come to the same results in a private-cloud cluster.

Why does it seem that it does not matter how many instances I spawn of a service?

UPDATE: As per the first answer, I'm updating information with information on the nodes and on what happens with a single Locust worker.

1 worker, 1 clients: 22 RPS
1 worker, 2 clients: 45 RPS
1 worker, 4 clients: 90 RPS
1 worker, 8 clients: 174 RPS
1 worker, 16 clients: 360 RPS
32 clients: 490 RPS
40 clients: 480 RPS (this seems over max. sustainable clients per worker)

But above all, it seems that the root problem is that I'm at the limit of capacity:

user@cloudshell:~ (project)$ kubectl top pod
NAME                                 CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%
gke-sc1-default-pool-cbbb35bb-0mk4   1903m        98%       695Mi           24%
gke-sc1-default-pool-cbbb35bb-9zgl   2017m        104%      727Mi           25%
gke-sc1-default-pool-cbbb35bb-b02k   1991m        103%      854Mi           30%
gke-sc1-default-pool-cbbb35bb-mmcs   2014m        104%      776Mi           27%
gke-sc1-default-pool-cbbb35bb-t6ch   1109m        57%       743Mi           26%
-- ssice
horizontal-scaling
kubernetes
nginx

1 Answer

2/13/2019

If I understood correctly, you did run the load testing on same cluster/nodes as your pods, this will definitely have an impact on the overall result, I would recommend you split the client from the server on separate nodes so that one does not affect each other.

For the values you reported, is clearly visible that the workers are consuming more CPU that the nginx servers.

You should check either:

  • The Host CPU utilization, it might be under high pressure with context switches because the amount threads is much higher than the number of CPU available.
  • A network bottleneck, maybe you could try add more nodes or increase the worker capacity(SKU) and split client from servers.
  • The clients does not have enough capacity to generate the load, you increase the threads but the raw limits are the same

You should also test individual server capacity to validate the limit of each server, so you have a parameter to compare if the results are in line with the expected values.

-- Diego Mendes
Source: StackOverflow