Spring Boot app + Kubernetes liveness/readiness checks

1/19/2019

I'm building a few Spring Boot microservices that are getting deployed in a Kubernetes (AKS specifically) cluster. I was planning on setting the probePaths for the liveness & readiness check to both point at the actuator health endpoint, but was wondering if that may not be the best option. My original thinking was that checking the path would be useful (at least for readiness) so that traffic wouldn't be sent to it until Spring has started up and is capable of handling requests. Since these services use a database connection, and the actuator health indicator will report status as down if it can't make a connection, will that not be such a good idea?

With liveness, I'm thinking it might start recycling the pods/containers over and over even though (in the case the DB is down) it might not fix anything.

With readiness, I'm thinking it might cause the pool of available apps to be 0 if the DB is down. The app itself will most likely not be very useful if the DB is down, but parts may still work potentially I suppose.

Is there a recommended best practice for this type of thing?

-- chinabuffet
kubernetes
spring-boot
spring-boot-actuator

4 Answers

11/19/2019

We have used Spring boot Actuator custom Health check for Liveness and Readiness check. You can have your custom logic to determine whether you are able to serve the request or not. If you are able to serve the request then keep the pod alive or else restart it. For database connection issues restart will only help if your connections are stuck and not released.

-- dassum
Source: StackOverflow

11/19/2019

We are using the standard /actuator/health endpoint for both liveness and readiness, and have been for close to a year now. The positive sides of this is that the application is not marked as ready for use unless all its connections are up and running. The downside is due to some cases of buggy connections, which leads to downtime/restarts.

In my mind, an application that holds no connection to its database (or other important infrastructure) is as good as useless. As it probably won't function properly, you might as well report that it is unavailable. So unless you have problems with poor connection to the database or other things, I can't really see the harm of using /actuator/health for both liveness and readiness. Also, it is a cheap way of checking whether your application is up and running, which requires very little manual work to set up.

-- Tobb
Source: StackOverflow

11/22/2019

ReadinessProbe - is the app ready to handle requests?

Use a health check to check if the app is ready to handle new requests. This can be implemented in /actuator/health. Also see StartupProbe below.

Under high load?

If your app is under high load, it may not be able to respond on health check in time, resulting in ReadinessProbe to fail. Consider using Horizontal Pod Autoscaler to get more replicas to handle the load.

LivenessProbe - is the app deadlocked?

If your app is in a unrecoverable state, it is best if it can terminate itself, e.g. using java.lang.System.exit(1). If the app can be deadlocked, unable to proceed, consider implementing an endpoint for LivenessProbe, this may be the same as for the ReadinessProbe.

Not responding to readiness in a long time

If your app haven't responded to the ReadinessProbe in a long time, e.g. many minutes, something is probably wrong (unless you expect this to happen for your app), then you should probably also have /actuator/health as your LivenessProbe but with a higher failureThreshold and a high initialDelaySeconds (e.g. a few minutes)

StartupProbe - better alternative on Kubernetes 1.16+

The ReadinessProbe is most useful during app startup, since it may need to load e.g. data before it is ready to receive requests - but ReadinessProbe is executed periodic during the pod lifecycle. StartupProbe is now a better alternative for slow starting apps in combination with LivenessProbe that is only active after StartupProbe. You may still need a ReadinessProbe to notify that the pod is ready to handle requests.

Depending on other services

If your app depends on other services, that are not healthy - it is better if your app can recover from those situations, when the backing service is up again, e.g. reconnect. Otherwise this will be a domino chain reaction if you have a chain of services that does not respond on ReadinessProbe or LivenessProbe because the last app in the chain have a problem. Consider to provide degraded service, notify that you are not in full service, maybe some of your endpoints still works correct.

Use Management Server Port

It is the kubelet on the same node that send probe requests. Consider to use a Management Server Port for probes. You don't need to expose this port to the Service, better to use one port for http and another for management.

Cloud provider Load Balancer Service health check

If you are using a Cloud Provider Load Balancer, it may do health checks on your services, and you may need to configure the path it sends health checks on, e.g. Google Cloud Platform defaults to /. This is a health check for the Service not for the individual Pod.

-- Jonas
Source: StackOverflow

5/19/2020

As of Spring Boot 2.3, the Availability state of the application (including Liveness and Readiness) is supported in the core and can be exposed as Kubernetes Probes with Actuator.

Your question is spot on and this was discussed at length in the Spring Boot issue for the Liveness/Readiness feature.

The /health endpoint was never really designed to expose the application state and drive how the cloud platform treats the app instance it and routes traffic to it. It's been used that way quite a lot since Spring Boot didn't have better to offer here.

The Liveness should only fail when the internal state of the application is broken and we cannot recover from it. As you've underlined in your question, failing here as soon as an external system is unavailable can be dangerous: the platform might recycle all application instances depending on that external system (maybe all of them?) and cause cascading failures, since other systems might be depending on that application as well.

By default, the liveness proble will reply with "Success" unless the application itself changed that internal state.

The Readiness probe is really about the ability for the application to serve traffic. As you've mentioned, some health checks might show the state of essential parts of the application, some others not. Spring Boot will synchronize the Readiness state with the lifecycle of the application (the web app has started, the graceful shutdown has been requested and we shouldn't route traffic anymore, etc). There is a way to configure a "readiness" health group to contain a custom set of health checks for your particular use case.

I disagree with a few statements in the answer that received the bounty, especially because a lot changed in Spring Boot since:

  1. You should not use /actuator/health for Liveness or Readiness probes as of Spring Boot 2.3.0.
  2. With the new Spring Boot lifecycle, you should move all the long-running startup tasks as ApplicationRunner beans - they will be executed after Liveness is Success, but before Readiness is Success. If the application startup is still too slow for the configured probes, you should then use the StartupProbe with a longer timeout and point it to the Liveness endpoint.
  3. Using the management port can be dangerous, since it's using a separate web infrastructure. For example, the probes exposed on the management port might be OK but the main connector (serving the actual traffic to clients) might be overwhelmed and cannot serve more traffic. Reusing the same server and web infrastructure for the probes can be safer in some case.

For more information about this new feature, you can read the dedicated Kubernetes Liveness and Readiness Probes with Spring Boot blog post.

-- Brian Clozel
Source: StackOverflow