I use Google Cloud Platform for my project.
Now, I have a cluster with 4 node pools:
- "micro-pool": with minimal machines for managing the cluster
- "cpu-pool": with cpu-only machines for processes that don't need a GPU
- 2 "gpu-pools": two pools with machines that have GPUs attached.
Now, what I need is for my CPU processes to never work on a GPU machine because they take so much time and doing that on a GPU machine is just costing money for nothing.
I run my pods using thekubectl run dc-1 --image={image-name} --replicas=1 --restart=Never --limits="nvidia.com/gpu=0,cpu=4000m,memory=2Gi" -- bash -c "command to execute"
Now, this works fine if there were no "GPU-machines" created from previous GPU runs. But if there was a very recent GPU run, this command will run on that instance because it has the minimum cpu and memory requirements. I thought the --limits="nvidia.com/gpu=0
would do the trick but obviously it didn't.
What should I do?
This is a good use case for taints and tolerations. You can taint the GPU nodes with NoSchedule. This will prevent pods (even system pods) that don't have a toleration for that taint from running on the GPU nodes
kubectl taint nodes gpuNode1 nodetype=gpu:NoSchedule
Then, on pods you do want to run on these nodes, you can add a toleration for the taint:
tolerations:
- key: "nodetype"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
I'm not sure on GCP, but on Azure's AKS you can configure the taint when you create the cluster and the node pools.
Edit:
You will want to combine this with Harsh Manvar's sugestion of node selectors and/or affinity. Just because your pod can tolerate the taint, doesn't mean it will be scheduled on the GPU nodes for sure, it will just make sure other things are not.
if you want to assign the pod on particular instance or node you can use the kubernetes node selector
for example :
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
disktype: ssd
here it will assign pod based on the node selector which is disk type.
you can also check this url for further documentation : https://kubernetes.io/docs/concepts/configuration/assign-pod-node
Edit 1 :
as you are on GCP you can use this way also :
nodeSelector:
#<labelname>:value
cloud.google.com/gke-nodepool: pool-highcpu8 (poolname)
Edit 2 :
if you have knowledge of affinity
and anity-affinity
you can implement it also.
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/node-type
operator: In
values:
- gpu
For cpu :
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: resources
operator: In
values:
- cpu-only