I use Kubernetes in my project, specially HPA. So, every minute in project we started check-status
request for checking if all microservices are available. Availability is defined by simple response from one of replicas (not all) each microservice. \
But I have one moment related to HPA. When HPA automatically decides to remove some pods from cluster and my check-status
request comes to server at the same time then very often occurs that my API-gateway service push it to deleted pod and doesn't get any response. It means that microservice is unavailable for our server.\
My question is what is the best way for setting autoscaler to avoid this cases.
It is not related to HPA in this case but more on how you graceful shut down your pods.
In short, your service/LB is not aware if your pod is ready to accept new requests, so on a SIGTERM
signal, your pod should set your readiness probe
to false, and give some time for the app to shutdown. If your readiness probe is not healthy, the service won't send new requests to your pod.
Then you can shut it down once all requests have been addressed AND the pod won't receive new requests.
I would advise you of reading these sources: