We are having a Kubernetes service whose pods take some time to warm up with first requests. Basically first incoming requests will read some cached values from Redis and these requests might take a bit longer to process. When these newly created pods become ready and receive full traffic, they might become not very responsive for up to 30 seconds, before everything is correctly loaded from Redis and cached.
I know, we should definitely restructure the application to prevent this, unfortunately that is not feasible in a near future (we are working on it).
It would be great if it was possible to reduce the weight of the newly created pods, so they would receive 1/10 of the traffic in the beggining with the weight increasing as the time would pass. This would be also great for newly deployed versions of our application to see if it behaves correctly.
Until the application can be restructured to do this "priming" internally...
For when running on Kubernetes, look into Container Lifecycle Hooks and specifically into the PostStart
hook. Documentation here and example here.
It seems that the behavior of "...The Container's status is not set to RUNNING until the postStart handler completes" is what can help you.
There's are few gotchas like "... there is no guarantee that the hook will execute before the container ENTRYPOINT" because "...The postStart handler runs asynchronously relative to the Container’s code", and "...No parameters are passed to the handler".
Perhaps a custom script can simulate that first request with some retry logic to wait for the application to be started?
Why you need the cache loading in first call instead of having in heartbeat which is hooked to readiness probe? One other option is to make use of init containers in kubernetes