I need to check if the kubernetes node is configured correctly. Need to use nvidia-docker for one of the worker nodes.
Using: https://github.com/NVIDIA/k8s-device-plugin
How can I confirm that the configuration is correct for the device plugin?
$ kubectl describe node mynode
Roles: worker
Capacity:
cpu: 4
ephemeral-storage: 15716368Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 62710736Ki
nvidia.com/gpu: 1
pods: 110
Allocatable:
cpu: 3800m
ephemeral-storage: 14484204725
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 60511184Ki
nvidia.com/gpu: 1
pods: 110
System Info:
Machine ID: f32e0af35637b5dfcbedcb0a1de8dca1
System UUID: EC2A40D3-76A8-C574-0C9E-B9D571AA59E2
Boot ID: 9f2fa456-0214-4f7c-ac2a-2c62c2ef25a4
Kernel Version: 3.10.0-957.1.3.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.9.1
Kubelet Version: v1.11.2
Kube-Proxy Version: v1.11.2
However, I can see the nvidia.com/gpu
under node resources, the question is: is the Container Runtime Version
supposed to say nvidia-docker
if the node is configured correctly? Currently, it shows docker
which seems fishy, I guess!
Not sure if you did it already, but it seems to be clearly described:
After installing NVIDIA drivers and NVIDIA docker, you need to enable nvidia runtime on your node, by editing /etc/docker/daemon.json
as specified here. So as the instruction says, if you can see that runtimes
is correct, you just need to edit that config.
Then deploy a DeamonSet (which is a way of ensuring that a pod runs on each node, with access to host network and devices):
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml
Now your containers are ready to consume the GPU - as described here.