On k8s cluster (GCP) during nodes auto-scaling, my pods are rescheduled automatically. The main problem that they perform computations and keep results in memory during auto-scaling. Because of rescheduling, pods lose all results and tasks.
I want to disable rescheduling for specified pods. I know a few possible solutions:
I have tried PDB and set minAvailable = 1 but it didn't work. I found that you can also set maxUnavailable=0, will it more effective? I didn't understand exactly the behaviour if maxUnavailable when it's set to 0. Could you explain it more? Thank you!
Link for more details - https://github.com/dask/dask-kubernetes/issues/112
Are you specifying resource requests and limits?
Setting max unavailable to 0 is a way to go and also, using nodepools can be a good workaround.
gcloud container node-pools create <nodepool> --node-taints=app=dask-scheduler:NoSchedule
gcloud container node-pools create <nodepool> --node-labels app=dask-scheduler
This will create the nodepool with the label app=dask-scheduler, after in the pod spec, you can do this:
nodeSelector:
app: dask-scheduler
And put the dask scheduler on a node-pool that doesn't autoscale.
There's an object called PDB where in its spec you can set maxUnavailable in the example of maxUnavailable=1, this means if you had 100 pods defined, always make sure there is only one removed/drained/re-scheduled at a time in the case of maxUnavailable, if you have 2 pods, and you set maxUnavailable to 0, it will never remove your pods. It being the scheduler
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: zk-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: zookeeper