Recovering from failure in orchestrated deployments

7/4/2017

In monolithic systems without orchestration, whenever there’s a temporary problem accessing a resource (ex. Connecting to a database), the typical approach is to keep retrying until recovery is achieved.

In systems with a microservices architecture, where typically the boot process is light, removing the retry logic from the application and abort the process, letting the orchestrator restart the process, can reduce the application's complexity. If the orchestrator can deal with service dependencies, it might even know exactly what needs to be recovered and when is appropriate to start the service again. There’s no “blind” retry.

If the service has persistent connections from clients, then terminating the service might be a problem, other than that I think terminating the process is an approach to consider.

Does anyone has any experiente that can share? Feedback would be very helpful.

-- rgoncalves
architecture
kubernetes
microservices

1 Answer

7/4/2017
-- Javier Salmeron
Source: StackOverflow