Do you know (if it is possible) how to reserve threads/memory for a specific endpoint in a spring boot microservice?
I've one microservice that accepts HTTP Requests via Spring MVC, and those requests trigger http calls to 3rd system, which sometimes is partially degraded, and it responds very slow. I can't reduce the timeout time because there are some calls that are very slow by nature.
I've the spring-boot-actuator /health
endpoint enabled and I use it like a container livenessProbe
in a kubernetes cluster. Sometimes, when the 3rd system is degraded, the microservice doesn't respond to /health
endpoint and kubernetes restarts my service.
This is because I'm using a RestTemplate to make HTTP calls, so I'm continuously creating new threads, and JVM starts to have problems with the memory.
I have thought about some solutions:
Implement a high availability “/health” endpoint, reserve threads, or something like that.
Use an async http client.
Implement a Circuit Breaker.
Configure custom timeouts per 3rd endpoint that I'm using.
Create other small service (golang) and deploy it in the same pod. This service is going to process the liveness probe.
Migrate/Refactor services to small services, and maybe with other framework/languages like Vert.x, go, etc.
What do you think?
Thanks in advanced.
Sounds like your Microservice should still respond to health checks /health
whilist returning results from that 3rd service its calling.
I'd build an async http server with Vert.x-Web and try a test before modifying your good code. Create two endpoints. The /health
check and a /slow
call that just sleeps() for like 5 minutes before replying with "hello". Deploy that in minikube or your cluster and see if its able to respond to health checks while sleeping on the other http request.
I have a prototype just wrapping up for this same problem: SpringBoot permits 100% of the available threads to be filled up with public network requests, leaving the /health endpoint inaccessible to AWS load balancer which knocks the service offline thinking it's unhealthy. There's a different between unhealthy and busy... and health is more than just a process running, port listening, superficial check, etc - it needs to be a "deep ping" which checks that it and all its dependencies are operable in order to give a confident health check response back.
My approach to solving the problem is to produce two new auto-wired components, the first to configure Jetty with a fixed, configurable maximum number of threads (make sure your JVM is allocated enough memory to match), and the second to keep a counter of each request as it starts and completes, throwing an Exception which maps to an HTTP 429 TOO MANY REQUESTS response if the count approaches a ceiling which is the maxThreads - reserveThreads. Then I can set reserveThreads to whatever I want and the /health endpoint is not bound by the request counter, ensuring that it's always able to get in.
I was just searching around to figure out how others are solving this problem and found your question with the same issue, so far haven't seen anything else solid.
To configure Jetty thread settings via application properties file: http://jdpgrailsdev.github.io/blog/2014/10/07/spring_boot_jetty_thread_pool.html
The actuator health endpoint is very convenient with Spring boot - almost too convenient in this context as it does deeper health checks than you necessarily want in a liveness probe. For readiness you want to do deeper checks but not liveness. The idea is that if the Pod is overwhelmed for a bit and fails readiness then it will be withdrawn from the load balancing and get a breather. But if it fails liveness it will be restarted. So you want only minimal checks in liveness (Should Health Checks call other App Health Checks). By using actuator health for both there is no way for your busy Pods to get a breather as they get killed first. And kubernetes is periodically calling the http endpoint in performing both probes, which contributes further to your thread usage problem (do consider the periodSeconds on the probes).
For your case you could define a liveness command and not an http probe - https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-a-liveness-command. The command could just check that the Java process is running (so kinda similar to your go-based probe suggestion).
For many cases using the actuator for liveness would be fine (think apps that hit a different constraint before threads, which would be your case if you went async/non-blocking with the reactive stack). Yours is one where it can cause problems - the actuator's probing of availability for dependencies like message brokers can be another where you get excessive restarts (in that case on first deploy).