I am exploring different strategies for handling shutdown gracefully in case of deployment/crash. I am using the Spring Boot framework and Kubernetes. In a few of the services, we have tasks that can take around 10-20 minutes(data processing, large report generation). How to handle pod termination in these cases when the task is taking more time. For queuing I am using Kafka.
we have tasks that can take around 10-20 minutes(data processing, large report generation)
First, this is more of a Job/Task rather than a microservice. But similar "rules" applies, the node where this job is executing might terminate for upgrade or other reason, so your Job/Task must be idempotent and be able to be re-run if it crashes or is terminated.
How to handle pod termination in these cases when the task is taking more time. For queuing I am using Kafka.
Kafka is a good technology for this, because it is able to let the client Jon/Task to be idempotent. The job receives the data to process, and after processing it can "commit" that it has processed the data. If the Task/Job is terminated before it has processed the data, a new Task/Job will spawn and continue processing on the "offset" that is not yet committed.