I am using Kubernetes to create a deployment with 1000s of small clients. However, my server that these clients connect to cannot handle so many requests at once. Does anyone know a way in Kubernetes to only create 100 pods per minute, wait for them to in ready state and then create the next 100.
There is no such direct option in K8s 1.18. Here is related github issue. Some possible workarounds are:
use multiple deployments, maybe within multiple namespaces. It may be hard to manage single deployment with 1000s of replicas in therms of any changes.
implement random configurable delay inside your client(or with lightweight wrapper) to spread the load over time, depending on total wall clock startup time of your 1000s clients and server power
implement custom metrics agent with pod is ready
check to scale deployment with HorisontalPodAutoscaler as you need and use Scaling Policies to create 100 pods per minute