Uneven OKD Infra Load Balancing using `roundrobin` and `leastconn`

2/25/2019

Using openshift v3.10.0+0c4577e-1 I'm seeing uneven loading across 4 VMs hosting all my pods which are endpoints to a single service route.

I've setup a pretty straight forward testing environment and I'm seeing some behavior that doesn't make any sense to me.

I've got a physical host setup running JMeter configured to make requests to a single service route IP backed by a collection of pods. The pods are hosting a very lightweight Wordpress site. Each pod is identical for the purposes of the test I'm doing.

The pods are split among 4 VMs. All 4 VMs are running on a single separate(from the JMeter host) physical host.

As I scaled out the number of VMs each additional VM continues to be less and less loaded. When I moved from 2 to 3 VMs (50% more VMs) I only saw a 33% improvement in the number of requests I could handle, moving from 3 to 4 VMs (33% more VMs) the improvement was around 18%. I'm not anticipating perfect scaling, but the graphs below don't make sense to me:

VM Loading

As you can see the first VM is using 100% of the available CPU (8 vCPUs), but each VM after uses less and less CPU. The 4th VM is down to ~75%. I'd expect the loading to be a lot closer.

The graphs show two tests, the first was with the roundrobin strategy, the second I switched to leastconn

Is this a problem with the load balancing strategies? Is there someway to better balance the requests across the VMs?

-- John Westlund
haproxy
kubernetes
openshift
openshift-origin

1 Answer

3/2/2019

I've split up the pods among more routes so that I'm not longer rate limited by the "fully loaded" VM. I still believe the pods inside this VM are underperforming relative to the other VMs, but at least I'm no longer leaving performance on the table in the rest of the VMs

-- John Westlund
Source: StackOverflow