How to properly label and configure Kubernetes to use Nvidia GPUs?

11/12/2019

I have an in house K8s cluster running on bare metal. On one of my worker nodes I have 4 GPUs and I want to configure K8s to recognise and use these GPUs. Based on the official documentation I installed all the required stuff and now when I run:

docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi


Tue Nov 12 09:20:20 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:02:00.0 Off |                  N/A |
| 29%   25C    P8     2W / 250W |      0MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  On   | 00000000:03:00.0 Off |                  N/A |
| 29%   25C    P8     1W / 250W |      0MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 208...  On   | 00000000:82:00.0 Off |                  N/A |
| 29%   26C    P8     2W / 250W |      0MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce RTX 208...  On   | 00000000:83:00.0 Off |                  N/A |
| 29%   26C    P8    12W / 250W |      0MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I know that I have to label the node so K8s recognise these GPUs but I can't find the correct labels on the official documentations. On the docs I just see this:

# Label your nodes with the accelerator type they have.
kubectl label nodes <node-with-k80> accelerator=nvidia-tesla-k80

While on another tutorial (just for google cloude) I found this:

aliyun.accelerator/nvidia_count=1                          #This field is important.
aliyun.accelerator/nvidia_mem=12209MiB
aliyun.accelerator/nvidia_name=Tesla-M40

So what is the proper way to label my node? Do I need to also label it with the number and memory size of GPUs?

-- AVarf
gpu
kubernetes
nvidia

1 Answer

11/12/2019

I see you are trying to make sure that your pod gets scheduled on a node with GPUs

The easiest way to do it would be to label a node with GPU like this:

kubectl label node <node_name> has_gpu=true

and then creating your pod add nodeSelector fied with has_gpu: true. In this way pod will be scheduled only on nodes with GPUs. Read more here in k8s docs

The only problem with it is that in this case scheduler is not aware of how many GPUs are on the node and can schedule more than 4 pods on the node with only 4 GPUs.

Better option would be to use node extended resource

It would look like follows:

  1. run kubectl proxy
  2. patch node resource configuration:

    curl --header "Content-Type: application/json-patch+json" \
    --request PATCH \
    --data '[{"op": "add", "path": "/status/capacity/example.com~1gpu", "value": "4"}]' \
    http://localhost:8001/api/v1/nodes/<your-node-name>/status
  3. assign an extender resource to a pod

    apiVersion: v1
    kind: Pod
    metadata:
    name: extended-resource-demo
    spec:
    containers:
    - name: extended-resource-demo-ctr
        image: my_pod_name
        resources:
            requests:
                example.com/gpu: 1
            limits:
                example.com/gpu: 1

In this case scheduler is aware how many GPUs are available on the node and won't schedule more pods if cannot satisfy requests.

-- HelloWorld
Source: StackOverflow