I am maintaining a couple of Spring Boot web service applications (war), currently running on four identical Tomcat instances.
A load balancer in front makes shure trafic is spread across the four instances.
We do manual rolling deployment.
Before taking an instance down for upgrade, we divert new traffic away from it. We then give active requests a grace period of two minutes, before terminating the applications.
Now I am in the process of migrating these applications to OpenShift. This is all going very well, except that I have a hard time making the rolling deployment work to my satisfaction.
Googling for help I have reached a solution based on:
At first this seemed to work, but it turns out that the livenes probe sometimes kick in and kill the pod, even if the ShutdownHook hasnt finished yet.
If I remove the livenes probe it works, but I dont see that as a real solution.
Experiments has revealed to me, that once the ShutdownHook pauses the Tomcat connector, the actuator/health endpoint is responding with "connection refused" - which makes sense, but is not what I need, because it makes the liveness probe deem the application dead.
I have tried moving the actuator endpoints to another port number, but this is even worse, as they now stop responding immediately when the shutdown starts.
I assume this is caused by the actuator endpoints now belonging to a Tomcat connector different from my main connector, and not under the control of my main Spring application context.
Can any of you tell me how to stall the shutdown of the actuator endpoints when on a separate port number?
Or any other suggestion really - allowing me to:
Given that you just want to prevent traffic from going to your pod while it performs a graceful shutdown, you could use a low Readiness probe timeout, that upon failure, removes your pod from the list of serviceable pods. Then increase your liveness probe timeout to allow your pod plenty of time to shutdown gracefully while still having a fallback in case your pod truly is stuck.