How to safely terminate Pods under load

9/21/2020

There's a few scenarios where Pods need to be terminated without reducing a Deployment's total capacity:

  • Draining nodes for maintenance
  • Draining nodes for binpacking
  • Misbehaving Pods / handling memory leaks without triggering OOM Killer

Imagining a situation where we have two Pods taking considerable traffic, if using the default workflow of terminating a Pod and then letting Kubernetes react by re-creating it, we'll have 50% our processing capacity for some arbitrary amount of time.

In high throughput applications this will degrade the service level in one or more of the following ways:

  • Request queueing in non-multithreaded non-async applications like Rails increases response time
  • Higher context switching, in the case of async multithreaded applications, increases response time
  • Timeout error spikes in applications with strict response time SLOs and timeouts
  • Cascading slowdowns in services that depend on the high throughput application in question, if we don't enforce strict response time SLOs and timeouts

Ideally what I'd look for is some create-before-destroy pattern. Something like: we ask Kube to terminate a Pod, but before removing it from the Endpoints in any service it's listed, it triggers a scale-up, respects the Readiness Gate, then starts terminating the Pod we asked it to terminate. I haven't found any mention of such a pattern in Kubernetes.

How do folks deal with this scenario? I can imagine the following:

  • Lower targeAverageUtilization of HPA, so we can tolerate a temporary 50% capacity reduction, but this means we'll pay more than we want
  • Optimize Pod time-to-ready so we stay underprovisioned for very few seconds, but this seems very difficult when, for instance, AWS Load Balancers take at least 10 seconds to make a new target go healthy
  • Create an involved workflow where instead of kubectl delete [pod] we:
    • Increase HPA minReplicas to one above current replicas
    • Wait for Pod to go ready
    • Wait a few seconds for it to warm-up
    • Run kubectl delete [pod] to kill the desired pod
    • Wait for the new replacement pod to go ready
    • Wait a few seconds for it to warm-up
    • Restore minReplicas in the HPA
    • Run the risk of this whole operation happening concurrently with a traffic increase/spike and seeing any of the degradation effects I listed above

None of these seem good tho. The last one, especially, won't cover binpacking as we can't replace how the Cluster Autoscaler drains instances.

-- Juliano
hpa
kubernetes

1 Answer

9/21/2020
-- Max Lobur
Source: StackOverflow