In trying to achieve a highly available web server set up, I ran a load test against an extremely simple NGINX docker container serving a single static file.
At first, I tried with a single node (n1-standard-4) & single pod cluster, to benchmark how much one "unit" could do. This single node/pod setup could cope with around 20k concurrent requests before starting to timeout for some requests/drop in throughput.
I then added another node of the same machine type and scaled the pods to two replicas. Once I confirmed both pods/nodes were up, I ran the test again with 20k. Performance was good so I bumped up to 40k - expecting that I would see similar results to the 20k on the 1 node/pod setup.
However, the performance was very bad, never with requests per second jumping violently between 15k and 30k.
I tried the same test again with 4 nodes/pods and saw similar, if not slightly worse results.
My question(s) are:
Am I wrong to think that my concurrent requests should scale linearly this way with GKE/containers/kubernetes?
What am I missing in order to achieve the desired results of being able to cope with N concurrent users with M nodes/pods in the cluster?
EDIT: I also do not believe it is an issue with the load testing tool - as I am using an external paid service that claims to be able to simulate up to 100k concurrent requests.
EDIT 2: Here's some more information about the setup:
Dockerfile:
FROM nginx
ADD nginx.conf /etc/nginx/nginx.conf
ADD index.html /usr/share/nginx/html/index.html
ADD data.json /usr/share/nginx/html/data.json
nginx.conf:
user nginx;
worker_processes 4;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 4096;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log off;
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
#gzip on;
server {
listen 80 default_server;
listen [::]:80 default_server;
root /usr/share/nginx/html;
# Serve the index.html page
location / {
try_files /index.html =404;
}
}
include /etc/nginx/conf.d/*.conf;
}
index.html:
<head><title>Load Test Page</title></head><body><h3>Load Test!</h3></body>
I used the "LoadBalancer" service, which I believe setups a Google Cloud Balancer (I confirmed this and saw the nodes being added to it). Here are the files below I used with kubernetes to manage the cluster.
rc.yml:
apiVersion: v1
kind: ReplicationController
metadata:
name: nginx-rc
labels:
name: nginx-rc
spec:
replicas: 2
selector:
name: nginx-server
template:
metadata:
labels:
name: nginx-server
name: nginx-server
spec:
containers:
- name: nginx-server
image: [[ my image ]]
ports:
- containerPort: 80
hostPort: 80
services.yml:
apiVersion: v1
kind: Service
metadata:
labels:
name: nginx-lb
name: nginx-lb
spec:
ports:
- port: 80
targetPort: 80
selector:
name: nginx-server
type: LoadBalancer
If you want to run a preconfigured load test, you can check out the instructions for the kubernetes scale-demo that shows how to serve 1 million QPS of static files using nginx, which is very similar to your test setup.
Also keep in mind that a single VM (regardless of the number of pods running on it) will have a limit to how much bandwidth it can use and how many packets-per-second it can process. To serve more load, you should create a larger VM type or add additional VMs.