One of my micro service is running on Kubernetes. I would like to specify to K8s load balancer when a pod is busy because the behaviour that I get currently is not ok.
One example:
I have 8 pods running, each pod can process 1 request at a time. Each request take from 70 to 100% of the CPU core allocated to the pod. But when I send 8 requests to my application, Kubernetes does not dispatch those requests to the 8 pods but try to use only one. And since I'm blocking (via threadpool) each replica of app to use only one thread at a time, of course requests are queued for pod 1.
So my question is: How can I tell Kubernetes that POD 1 is busy and that load-balancer must dispatch request 2 to POD 2 ?
Note: For dev and test purpose I'm using Docker Desktop (Docker for Windows) on Windows 10 and kubectl.
You have to use LivenessProbe
when a Pod will not able to handle a request its IP will be removed from Service endpoints, so no traffic will be forwarded to it.
As prometherion suggested you can use the liveness probe and also i would suggest to add the rediness probe
together.
you can have a look at the official document : https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
Sometimes, applications are temporarily unable to serve traffic. For example when, application first need to load large data or configuration files during startup.
In such cases, you don’t want to kill the application, but you don’t want to send traffic either there to pods. K8s provides readiness probes to detect and mitigate these situations. A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services.