I have multiple gpu cards within one machine, and I need to let the k8s allocate gpu/npus device following some rules I set.
For example, supposing there are 8 gpu cards whose id is from 0-7, and only device0、device1、device6 and device7 are available. Now I need to create one pod with 2 devices, these two devices must be either of (device0, device1) or (device6, device7). Other device combinations such as (device0, device6) are not valid.
Is there any way to do that? I am using kubernetes of version 1.18 and implemented my own device plugin.
I don't understand why would you write a rule like this:
every device-id be smaller than 4
If you want to limit the amount of GPUs you should be using limits
and requests
which is nicely explained on Schedule GPUs. So you can just limit the resource to only 4 GPUs like so:
apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
# https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
image: "k8s.gcr.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 4 # requesting 1 GPU
If you have different types of GPUs on different nodes you can use labels
which you can read here Clusters containing different types of GPUs.
# Label your nodes with the accelerator type they have.
kubectl label nodes <node-with-k80> accelerator=nvidia-tesla-k80
kubectl label nodes <node-with-p100> accelerator=nvidia-tesla-p100
If your nodes are running different versions of GPUs, then use Node Labels and Node Selectors to schedule pods to appropriate GPUs. Following is an illustration of this workflow:
As part of your Node bootstrapping, identify the GPU hardware type on your nodes and expose it as a node label.
NVIDIA_GPU_NAME=$(nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0)
source /etc/default/kubelet
KUBELET_OPTS="$KUBELET_OPTS --node-labels='alpha.kubernetes.io/nvidia-gpu-name=$NVIDIA_GPU_NAME'"
echo "KUBELET_OPTS=$KUBELET_OPTS" > /etc/default/kubelet
Specify the GPU types a pod can use via Node Affinity rules.
kind: pod
apiVersion: v1
metadata:
annotations:
scheduler.alpha.kubernetes.io/affinity: >
{
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "alpha.kubernetes.io/nvidia-gpu-name",
"operator": "In",
"values": ["Tesla K80", "Tesla P100"]
}
]
}
]
}
}
}
spec:
containers:
-
name: gpu-container-1
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 2
This will ensure that the pod will be scheduled to a node that has a Tesla K80 or a Tesla P100 Nvidia GPU.
You could find other relevant information on unofficial-kubernetes Scheduling gpus.