Curious Kubenerte load balance result when integration with Spring Cloud Gateway, need some insights

6/4/2020

There,

The question confuses me several days. Each step was tested right, but it goes wrong when put them together. I had reproduced the issue with simpilified solution, here is the summary.

  1. I have a spring boot service, deployed on the k8s platform. It has two instances(pods) and the service IP is A;
  2. We have a spring cloud gateway based gateway (IP is B), deployed on the k8s also. The gateway was expected to accept and dispatch request to specific service, controlled by some route rules.
  3. For test purpose, I also setup an nginx reverse proxy (IP is C), which proxy the request to about IP A direclty. it will be used in below test cases

As there is only two instances and there is specific config, by the default behavior of round robin, requests should be handled by the two instances one by one. I verify the behavior by send request to IP A directly.

This is the test cases that works or not. "OK" means the traffix was well balanced, "Not OK" means not and "->" means how the topology affects the request flow.

  1. post reqest to service (in gateway POD) -> K8S ClusterIP OK
  2. post reqest to service (in K8S node) -> K8S ClusterIP OK
  3. post reqest to service (in K8S node) -> NGINX -> K8S ClusterIP OK
  4. post reqest to gateway -> gateway -> K8S ClusterIP, NOT OK, request processed by only one instance
  5. post reqest to gateway -> gateway -> K8S ClusterIP -> NGINX OK
  6. post reqest to gateway -> gateway -> NGINX -> K8S ClusterIP OK

More tests show request were handled by both instances in case #4. Only after 200 requests are handled by one instance, the request will be handled by another instance. the batch size is exactly 200! I have not find any doc on this strange behavior. []

The most strange thing is, gateway and k8s service know nothing between them, except the IP address. How is it possible gateway/k8s works seperately and it breaks when glue them together? and after insert another layer (the NGINX) all back to normal?

Your inputs are high appreciated.

EDIT The 200 per batch is not general. More tests on another service do not follow this.

-- Bingfeng
kubernetes
spring-cloud-gateway

0 Answers