I've raised the pod replicas to something like 50 in a cluster, watched it scale out, and then dropped the replicas back to 1. As it turns out I've disabled scale-down for one node. I've noticed that k8s will leave the remaining replica on that node. However, I've seen it remove that node when the annotation to prevent scale-down is not present. So somehow k8s makes decisions based on some kind of knowledge of nodes, or at least that the oldest POD is the one on the given node. Or something else altogether.
After a scale down of k8s pod replicas how does k8s choose which to terminate?
Roughly speaking it tries to keep things spread out over the nodes evenly. You can find the code in https://github.com/kubernetes/kubernetes/blob/edbbb6a89f9583f18051218b1adef1def1b777ae/pkg/controller/replicaset/replica_set.go#L801-L827 If the counts are the same, it's effectively random though.