Autoscaling service from single async client (one connection)

8/26/2017

I have a client that makes asynchronous calls to a gRPC service managed by kubernetes. The function calls are computationally expensive and they each take a while to complete. Therefore many of the calls wait for response in a queue (as shown in this tutorial https://grpc.io/docs/tutorials/async/helloasync-cpp.html or more specific https://github.com/grpc/grpc/blob/v1.4.x/examples/cpp/helloworld/greeter_async_client2.cc). What I notice is that all the calls are served by the same pod and other pods remain unused on my cluster.

If I launch multiple instances of the client it picks up different nodes or pods, but I'm interested in this happening for calls from one async client connection.

Is this possible and if so, does it require some specific configuration?

(I realize that I could open many connections from one script, but this does not seem optimal??)

I should also mention that I'm running a local kubernetes setup with just a few nodes which is setup using kubeadm.

-- toeplitz
kubernetes

1 Answer

8/26/2017

kube-proxy is an L4 load balancer so it's not able to distinguish between separate http requests (L7) in one stream. Depending on what you are trying to achieve an L7 proxy (that supports HTTP/2) could be a solution.

There is a nice overview in this document: https://grpc.io/blog/loadbalancing

-- Janos Lenart
Source: StackOverflow