How to prevent a GCE Kubernetes pod from working on a GPU instance?

8/16/2019

I use Google Cloud Platform for my project.
Now, I have a cluster with 4 node pools:
- "micro-pool": with minimal machines for managing the cluster
- "cpu-pool": with cpu-only machines for processes that don't need a GPU
- 2 "gpu-pools": two pools with machines that have GPUs attached.

Now, what I need is for my CPU processes to never work on a GPU machine because they take so much time and doing that on a GPU machine is just costing money for nothing.
I run my pods using the
kubectl run dc-1 --image={image-name} --replicas=1 --restart=Never --limits="nvidia.com/gpu=0,cpu=4000m,memory=2Gi" -- bash -c "command to execute"

Now, this works fine if there were no "GPU-machines" created from previous GPU runs. But if there was a very recent GPU run, this command will run on that instance because it has the minimum cpu and memory requirements. I thought the --limits="nvidia.com/gpu=0 would do the trick but obviously it didn't.

What should I do?

-- Ahmedn1
google-cloud-platform
gpu
kubectl
kubernetes

2 Answers

8/16/2019

This is a good use case for taints and tolerations. You can taint the GPU nodes with NoSchedule. This will prevent pods (even system pods) that don't have a toleration for that taint from running on the GPU nodes

kubectl taint nodes gpuNode1 nodetype=gpu:NoSchedule

Then, on pods you do want to run on these nodes, you can add a toleration for the taint:

tolerations:
- key: "nodetype"
  operator: "Equal"
  value: "gpu"
  effect: "NoSchedule"

I'm not sure on GCP, but on Azure's AKS you can configure the taint when you create the cluster and the node pools.

Edit:

You will want to combine this with Harsh Manvar's sugestion of node selectors and/or affinity. Just because your pod can tolerate the taint, doesn't mean it will be scheduled on the GPU nodes for sure, it will just make sure other things are not.

-- Mike Breed
Source: StackOverflow

8/16/2019

if you want to assign the pod on particular instance or node you can use the kubernetes node selector

for example :

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: test
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  nodeSelector:
    disktype: ssd

here it will assign pod based on the node selector which is disk type.

you can also check this url for further documentation : https://kubernetes.io/docs/concepts/configuration/assign-pod-node

Edit 1 :

as you are on GCP you can use this way also :

nodeSelector:
        #<labelname>:value
        cloud.google.com/gke-nodepool: pool-highcpu8 (poolname)

Edit 2 :

if you have knowledge of affinity and anity-affinity you can implement it also.

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/node-type
            operator: In
            values:
            - gpu

For cpu :

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: resources
              operator: In
              values:
              - cpu-only
-- Harsh Manvar
Source: StackOverflow