I have a Kubernetes cluster running on GKE. The cluster uses nginx as a reversed proxy and as the only gateway. Behind the nginx there are several deployments, each one with multiple replicas, all written in Java and rely on spring bootstrap. I noticed that when a new pod is going up (by any reason), its initial latency is going to be significantly higher than it should be. For example, if the average latency is 5ms, and the average 99 percentile is 30ms, on startup the 99 percentile latency gets up to 1 second.
With logs in the service itself, I see that all of the requests are done within a few milliseconds, so the latency is not applicative.
My main suspects were threads, but I have lots of possible threads, with the following configurations, and the pods have plenty of CPU and memory to use:
server.tomcat.max-threads=400
server.tomcat.min-spare-threads: 400
Another suspect might be a connection to nginx. Is it possible that Kubernetes sends requests to the new pods before they are actually ready? The readiness probe is:
readinessProbe:
failureThreshold: 12
httpGet:
path: /actuator/info
port: 8080
scheme: HTTP
initialDelaySeconds: 15
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 15
I have no idea how to approach it. Any ideas??
If you use Spring Boot, you should use the Health Endpoints dedicated for Kubernetes, so your Readiness Probe should look like this.
readinessProbe:
failureThreshold: 12
httpGet:
path: /actuator/health/readiness
port: 8080
scheme: HTTP
initialDelaySeconds: 15
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 15
Otherwise, yeah, it's possible that Kubernetes thinks that your application is ready and sends the traffic to it, while in fact, it's still starting.
Read more at the blog post: Liveness and Readiness Probes with Spring Boot.