There's a few scenarios where Pods need to be terminated without reducing a Deployment's total capacity:
Imagining a situation where we have two Pods taking considerable traffic, if using the default workflow of terminating a Pod and then letting Kubernetes react by re-creating it, we'll have 50% our processing capacity for some arbitrary amount of time.
In high throughput applications this will degrade the service level in one or more of the following ways:
Ideally what I'd look for is some create-before-destroy pattern. Something like: we ask Kube to terminate a Pod, but before removing it from the Endpoints in any service it's listed, it triggers a scale-up, respects the Readiness Gate, then starts terminating the Pod we asked it to terminate. I haven't found any mention of such a pattern in Kubernetes.
How do folks deal with this scenario? I can imagine the following:
targeAverageUtilization
of HPA, so we can tolerate a temporary 50% capacity reduction, but this means we'll pay more than we wantkubectl delete [pod]
we:minReplicas
to one above current replicaskubectl delete [pod]
to kill the desired podminReplicas
in the HPANone of these seem good tho. The last one, especially, won't cover binpacking as we can't replace how the Cluster Autoscaler drains instances.
Disruption Budget:
Set minAvailable
to 90% and that will be it.