I use Kubernetes in my project, specially HPA. So, every minute in project we started check-status request for checking if all microservices are available. Availability is defined by simple response from one of replicas (not all) each microservice. \
But I have one moment related to HPA. When HPA automatically decides to remove some pods from cluster and my check-status request comes to server at the same time then very often occurs that my API-gateway service push it to deleted pod and doesn't get any response. It means that microservice is unavailable for our server.\
My question is what is the best way for setting autoscaler to avoid this cases.
It is not related to HPA in this case but more on how you graceful shut down your pods.
In short, your service/LB is not aware if your pod is ready to accept new requests, so on a SIGTERM signal, your pod should set your readiness probe to false, and give some time for the app to shutdown. If your readiness probe is not healthy, the service won't send new requests to your pod.
Then you can shut it down once all requests have been addressed AND the pod won't receive new requests.
I would advise you of reading these sources: