There,
The question confuses me several days. Each step was tested right, but it goes wrong when put them together. I had reproduced the issue with simpilified solution, here is the summary.
As there is only two instances and there is specific config, by the default behavior of round robin, requests should be handled by the two instances one by one. I verify the behavior by send request to IP A directly.
This is the test cases that works or not. "OK" means the traffix was well balanced, "Not OK" means not and "->" means how the topology affects the request flow.
More tests show request were handled by both instances in case #4. Only after 200 requests are handled by one instance, the request will be handled by another instance. the batch size is exactly 200! I have not find any doc on this strange behavior. []
The most strange thing is, gateway and k8s service know nothing between them, except the IP address. How is it possible gateway/k8s works seperately and it breaks when glue them together? and after insert another layer (the NGINX) all back to normal?
Your inputs are high appreciated.
EDIT The 200 per batch is not general. More tests on another service do not follow this.