How to ensure the container runtime is nvidia-docker for the kubernetes node?

2/8/2019

I need to check if the kubernetes node is configured correctly. Need to use nvidia-docker for one of the worker nodes.

Using: https://github.com/NVIDIA/k8s-device-plugin

How can I confirm that the configuration is correct for the device plugin?

$ kubectl describe node mynode
Roles:              worker
Capacity:
 cpu:                4
 ephemeral-storage:  15716368Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             62710736Ki
 nvidia.com/gpu:     1
 pods:               110
Allocatable:
 cpu:                3800m
 ephemeral-storage:  14484204725
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             60511184Ki
 nvidia.com/gpu:     1
 pods:               110
System Info:
 Machine ID:                 f32e0af35637b5dfcbedcb0a1de8dca1
 System UUID:                EC2A40D3-76A8-C574-0C9E-B9D571AA59E2
 Boot ID:                    9f2fa456-0214-4f7c-ac2a-2c62c2ef25a4
 Kernel Version:             3.10.0-957.1.3.el7.x86_64
 OS Image:                   CentOS Linux 7 (Core)
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://18.9.1
 Kubelet Version:            v1.11.2
 Kube-Proxy Version:         v1.11.2

However, I can see the nvidia.com/gpu under node resources, the question is: is the Container Runtime Version supposed to say nvidia-docker if the node is configured correctly? Currently, it shows docker which seems fishy, I guess!

-- enator
gpu
kubernetes
nvidia
nvidia-docker

1 Answer

2/8/2019

Not sure if you did it already, but it seems to be clearly described:

After installing NVIDIA drivers and NVIDIA docker, you need to enable nvidia runtime on your node, by editing /etc/docker/daemon.json as specified here. So as the instruction says, if you can see that runtimes is correct, you just need to edit that config.

Then deploy a DeamonSet (which is a way of ensuring that a pod runs on each node, with access to host network and devices):

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml

Now your containers are ready to consume the GPU - as described here.

-- aurelius
Source: StackOverflow