I've included more detail below, but the question I'm trying to answer is in the title. I'm currently trying to figure this out, but thought I'd ask here first in case anyone knows the answer off-hand.
About my setup
I have a Kubernetes service running on a Google Compute Engine cluster (started via Google Container Engine). It consists of a service (for the front-end stable IP), a replication controller, and pods running a Python server. The server is a Python gRPC server sleep-listening on a port.
There are 2 pods (2 replicas specified in the replication controller), one rc, one service, and 4 GCE instances (set to autoscale up to 5 based on CPU).
I'd like the service to be able to handle an arbitrary number of clients that want to stream information. However, I'm currently seeing that the service only talks to 16 of the clients.
I'm hypothesizing that the number of connections is either limited by the number of GCE instances I have, or by the number of pods. I'll be doing experiments to see how changing these numbers affects things.
Figured it out:
kubernetes scale rc <rc-name> --replicas=3
to support 24 clients.I'll be looking into autoscaling (with a horizontal pod scaler?) the number of pods based on incoming HTTP requests.
Update 1:
Kubernetes doesn't currently support horizontal pod scaling based on HTTP.
Update 2:
Apparently there are other things at play here, like the size of the thread pool available to the server. With N threads and P pods, I'm able to maintain P*N open channels. This works particularly well for me because my clients only need to poll the server once every few seconds, and they sleep when inactive.