I am currently working with GPU's and since they are expensive I want them to scale down and up depending on the load. However scaling up the cluster and preparing the node takes around 8 minutes since it installs the drivers and do some other preparation.
So to solve this problem, I want to let one node stay in idle state and autoscale the rest of the nodes. Is there any way to do it?
This way when a request comes, the idle node will take it and a new idle node will be created.
Thanks!
There are three different approaches:
1 - The 1st approach is entirely manual. This will help you keep a node in an idle state without incurring downtime for your application during the autoscaling process.
You would have to prevent one specific node from autosaling (let's call it "node A"). Create a new node and make replicas of the node A's pods to that new node. The node will be running while it is not part of the autoscaling process. Once the autoscaling process is complete, and the boot is finished, you may safely drain that node.
a. Create a new node.
b. Prevent node A from evicting its pods by adding the annotation "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
c. Copy a replica of node A, make replicas of the pods into that new node.
d. Once the autoscaler has scaled all the nodes, and the boot time has
completed, you may safely drain node A, and delete it.
2 - You could run a Pod Disruption Budget.
3 - If you would like to block the node A from being deleted when the autoscaler scales down, you could set the annotation "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true" on one particular node. This only works during a scaling down process.