I have read up on the Kubernetes docs but I'm unable to get a clear answer on my question. I'm using the official cluster-autoscaler.
From what I understand, seamless updates are easy with RollingUpdate strategy. I have not find the same "Rolling" strategy to be possible for scale-down.
EDIT
TL;DR I'm looking for HA on a) two+ replica deployment and b) one replica deployment
a) Can be achieved by using PDBs. Checkout Fritz's answer. If you need pods scheduled on different nodes, leverage anti-affinity (Marc's answer)
b) If you're okay with short disruption, PDB is the official way to go. If you need a workaround, my answer can be of inspiration.
Hate to answer my own question, but an easy solution to high-availability service with only one pod (not wasting resources with running one idle replica) is to use PreStop hook (to make the action blocking if proper SIGTERM handling is not implemented) together with terminationGracePeriodSeconds with enough time for the other service to start.
Contradicting to what has been said here, the scheduling happens when pod is terminating. After quick testing (should have done that together with reading docs) where I created a busybox (sh sleep 3600) deployment with one replica and terminationGracePeriodSeconds set to 240 seconds.
By deleting the pod, it will enter the Terminating state and stay in that state for 240 seconds. Immediately after marking the pod as Terminating, new pod was scheduled instead of it. So the previous pod has time to finish whatever it is doing and the other one can seamlessly take its place.
I haven't tested how will the networking behave since LB will stop sending new requests, but I assume the downtime will be much lower than without the terminationGracePeriodSeconds set to a higher amount than the default.
Beware that is not official by any means but serves as a workaround for my use case.
The scale down behavior can be configured with what is called a Disruption Budget
In your Deployment Manifest you can define maxUnavailable and minAvailable number of Pods during voluntary disruptions like draining nodes.
For how to do it, check out the K8s Documentation.
Below are some insight, hope this will help :
If you use a deployment, then the scheduler checks that you always have the desired number of replicas running. No less, no more. So when you kill a node (which have one of your replicas), the new pod will be scheduled after the termination of one of your original replicas. It's up to you to anticipate if it's a planified maintenance.
If you have lots of nodes (meaning more than one) and want to achieve HA (high availability) for your deployments, then you should have a look at pod affinity/anti-affinity. You can find out more in the official doc