Docker Container nvidia/k8s-device-plugin:1.9 Keeps Reporting Error

2/9/2020

I am trying to setup one small kubenertes cluster on my ubuntu 18.04 LTS server. Now every step is done, but checking the GPU status fails. The container keeps reporting errors:

1. Issue Description
I have done steps by Quick-Start, but when I run the test case, it reports error.

2. Steps to reproduce the issue

  • exec shell cmd

    docker run --security-opt=no-new-privileges --cap-drop=ALL --network=none -it -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins nvidia/k8s-device-plugin:1.9

  • check the erros

    2020/02/09 00:20:15 Starting to serve on /var/lib/kubelet/device-plugins/nvidia.sock
    2020/02/09 00:20:15 Could not register device plugin: rpc error: code = Unimplemented desc = unknown service deviceplugin.Registration
    2020/02/09 00:20:15 Could not contact Kubelet, retrying. Did you enable the device plugin feature gate?
    2020/02/09 00:20:15 You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
    2020/02/09 00:20:15 You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start

3. Environment Information
- outputs of nvidia-docker run --rm dlws/cuda nvidia-smi

NVIDIA-SMI 440.48.02 Driver Version: 440.48.02 CUDA Version: 10.2

  • outputs of nvidia-docker run --rm dlws/cuda nvidia-smi

NVIDIA-SMI 440.48.02 Driver Version: 440.48.02 CUDA Version: 10.2

  • contents of /etc/docker/daemon.json

contents:

{
"default-runtime": "nvidia",
"runtimes": {
    "nvidia": {
        "path": "nvidia-container-runtime",
        "runtimeArgs": []
    }
}

}

  • docker version: 19.03.2
  • kubernetes version: 1.15.2
-- Steve
docker
gpu
kubectl
kubernetes
nvidia-docker

1 Answer

2/24/2020

Finally I found the answer, hope this post would be helpful for others who encounter the same issue:

For kubernetes 1.15, use k8s-device-plugin:1.11 instead. The version 1.9 is not able to communicate with kubelet.

-- Steve
Source: StackOverflow