I have spring boot service which is running on distributed VMs, but I want to move that service to Kubernetes. Previously we had spring cloud gateway configured for request limiter across those 4 VMs, but now with Kubernetes, my application will be auto-scalable.
In that case, how can I limit the requests given that, Kubernetes could increase or decrease the pods based on traffic? How can I maintain the state of the incoming traffic but still keep my service stateless?
Essentially, you can do rate limiting by fronting your application with a proxy. (nginx, haproxy, etc). More specifically you can use a Kubernetes Ingress. More specifically you can use the nginx ingress controller and the can use something like limit rate in the ConfigMap or rate limiting through annotations in the ingress.
You could look into istio service mesh's rate limiting feature which has the following concepts for rate limiting traffic in k8s: