We are developing a simulation software which is deployed and scaled between multiple pods using kubernetes. When a user makes a simulation request, a pod is selected which starts doing the job and is considered as busy. When another user makes a simulation request, it should be routed to the next free pod. Currently, a busy pod is often selected (even though there are free ones) as kubernetes does not know which pods are busy/free.
Is it possible to balance requests in such way that a free pod is always selected? (Assuming that each app instance inside a pod exposes an HTTP endpoint which tells it's current busy/free status)
I think you can make use of readiness probes:
Sometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup, or depend on external services after startup. In such cases, you don't want to kill the application, but you don't want to send it requests either. Kubernetes provides readiness probes to detect and mitigate these situations. A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services.
You can make the application to respond to probe requests with non-200 return code. It will be noted and no new requests will pass in until readiness probe succeed again. There are downsides though: