cannot schedule kubernetes pods with request for nvidia.com/gpu

4/15/2018

i have been able to get kubernetes to recognise my gpus on my nodes:

$ kubectl get node MY_NODE -o yaml
...
allocatable:
  cpu: "48"
  ephemeral-storage: "15098429006"
  hugepages-1Gi: "0" 
  hugepages-2Mi: "0"
   memory: 263756344Ki
  nvidia.com/gpu: "8"
  pods: "110"
capacity:
  cpu: "48"
  ephemeral-storage: 16382844Ki
  hugepages-1Gi: "0"
  hugepages-2Mi: "0"
  memory: 263858744Ki
  nvidia.com/gpu: "8"
  pods: "110"
...

and i spin up a pod with

Limits:
  cpu:             2
  memory:          2147483648
  nvidia.com/gpu:  1
Requests:
  cpu:             500m
  memory:          536870912
  nvidia.com/gpu:  1

However, the pod stays in PENDING with:

Insufficient nvidia.com/gpu.

Am i spec'ing the resources correctly?

-- yee379
gpu
kubernetes
nvidia

1 Answer

4/17/2018

Have you installed NVIDIA plugin in K8S?

kubectl create -f nvidia.io/device-plugin.yml

Some devices are too old and cannot be healthchecked.So this option must be disabled:

containers:
      - image: nvidia/k8s-device-plugin:1.9
        name: nvidia-device-plugin-ctr
        env:
        - name: DP_DISABLE_HEALTHCHECKS
          value: "xids"

Take a look at:

-- Nicola Ben
Source: StackOverflow