How to implement graceful termination of nodes without service downtime when using cluster auto-scaler?

5/20/2019

I have set up K8S cluster using EKS. Cluster Auto-scaler(CA) has been configured to increase/decrease the number of nodes based on resources availability for pods. The CA terminates a node if it's unneeded and pods on the node can be scheduled to another node. Here, the CA terminates the node before rescheduling the pods on another node. So, the pods get scheduled on another node after the node gets terminated. Hence, There is some downtime of some services until the re-scheduled pods become healthy.

How can I avoid the downtime by ensuring that the pods get scheduled on another node before the node gets terminated?

The graceful termination period for nodes is set to 10 minutes(Default).

-- Nitesh
amazon-eks
autoscaling
high-availability
kubernetes

1 Answer

5/20/2019

You need to have multiple replicas of your application running. That will allow your application to survive even in case of node sudden death. Also you may want to configure antiAffinity rules to your app manifest to ensure that replicas reside on different nodes.

-- Vasily Angapov
Source: StackOverflow