I have a setup with 1 to 1 mapping between pods and nodes, where each node has a local ssd, and each pod writes custom data to that ssd. If a pod dies ungracefully, it can leave the local ssd in a state where it'd be unpredictable if other pods mounted onto it.
I'm planning on using cluster auto scaling, so my thinking is that if I can prevent containers being scheduled on the node, GCE will remove the node, and create a new clean one? How do I go about preventing the node having new pods scheduled to it?
I would run a scheduled task (probably cronjob) that checks the volume state. If the voulme is corrupted - add a label to the node - (for example - volume-state=corrupted
)
Schedule your Pod with nodeAffinity - requiredDuringSchedulingRequiredDuringExecution
and nodeSelector by your label
it will evict pods from nodes that cease to satisfy the pods’ node affinity requirements.