Kubernetes nodeport concurrent connection limit

12/15/2018

I'm running Kubernetes with AWS EKS. I'm performing some load tests for a nodeport service and seeing a concurrent connection limit of ~16k-20k when hitting a node the pod is not running on. I'm wondering if there's some way to increase the number of concurrent connections.

So I'm running a nodeport service with only 1 pod that is scheduled on node A. The load test I'm running tries to connect as many concurrent websocket connections as possible. The websockets just sleep and send heartbeats every 30s to keep the connection alive.

When I point the load tester (tsung) at node A, I can get upwards of 65k concurrent websockets before the pod gets OOMKilled so memory is the limiting factor and that's fine. The real problem is when I point the load tester at node B, and kube-proxy's iptables forward the connection to node A, all of sudden, I can only get about 16k-20k concurrent websocket connections before the connections start stalling. According to netstat, they are getting stuck in the SYN_SENT state.

netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n
...
20087 ESTABLISHED
30969 SYN_SENT

The only thing I can think of to check is my conntrack limit and it looks to be fine. Here is what I get for node B.

net.netfilter.nf_conntrack_buckets = 16384
net.netfilter.nf_conntrack_max = 131072
net.nf_conntrack_max = 131072 

Here is the port range. I'm not sure if it matters (I'm not sure if DNAT and SNAT use up ports), but the range seems to be well above 16k.

net.ipv4.ip_local_port_range = 32768    60999

The file descriptor limit and kernel TCP settings are the same for node A and node B so I think that rules them out.

Is there anything else that could be limiting the number of concurrent connections forwarded through iptables/netfilter?

-- Jesse Shieh
iptables
kubernetes
netfilter
networking
websocket

1 Answer

12/16/2018

You are always going to get worse performance when hitting the NodePort where your pod is not running. Essentially, your packets are going through extra hops trying (through iptables) to get its final destination.

I'd recommend using source IP for your NodePort service. Basically, patch your service with this:

$ kubectl patch svc <your-service> -p '{"spec":{"externalTrafficPolicy":"Local"}}'

Then let your load balancer forward traffic only to NodePorts that are serving traffic.

Alternatively, if you'd like to consider something better performing you could consider using proxy mode ipvs or something like BPF/Cillium for your overlay.

-- Rico
Source: StackOverflow