I’m trying to scheduling GPU in Kubernetes v1.13.1 and I followed the guide in https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#deploying-nvidia-gpu-device-plugin
But the gpu resources doesn't show up when I run kubectl get nodes -o yaml
, according to this post, I checked the Nvidia gpu device plugin.
I run:
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml
several times and the result is
Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml": daemonsets.extensions "nvidia-device-plugin-daemonset" already exists
It seems that I have installed the NVIDIA Device Plugin? But the result of kubectl get pods --all-namespaces
is
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-qdhvd 2/2 Running 0 65m
kube-system coredns-78d4cf999f-fk4wl 1/1 Running 0 68m
kube-system coredns-78d4cf999f-zgfvl 1/1 Running 0 68m
kube-system etcd-liuqin01 1/1 Running 0 67m
kube-system kube-apiserver-liuqin01 1/1 Running 0 67m
kube-system kube-controller-manager-liuqin01 1/1 Running 0 67m
kube-system kube-proxy-l8p9p 1/1 Running 0 68m
kube-system kube-scheduler-liuqin01 1/1 Running 0 67m
When I run kubectl describe node
, gpu is not in the the allocatable resource
Non-terminated Pods: (9 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ----------- - ---------- --------------- ------------- ---
kube-system calico-node-qdhvd 250m (2%) 0 (0%) 0 (0%) 0 (0%) 18h
kube-system coredns-78d4cf999f-fk4wl 100m (0%) 0 (0%) 70Mi (0%) 170Mi (1%) 19h
kube-system coredns-78d4cf999f-zgfvl 100m (0%) 0 (0%) 70Mi (0%) 170Mi (1%) 19h
kube-system etcd-liuqin01 0 (0%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-apiserver-liuqin01 250m (2%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-controller-manager-liuqin01 200m (1%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-proxy-l8p9p 0 (0%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-scheduler-liuqin01 100m (0%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system nvidia-device-plugin-daemonset-p78wz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 26m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1 (8%) 0 (0%)
memory 140Mi (0%) 340Mi (2%)
ephemeral-storage 0 (0%) 0 (0%)
As lianyouCat mentioned in the comments:
After installing nvidia-docker2, the default runtime of docker should be modified to nvidia docker as github.com/NVIDIA/k8s-device-plugin#preparing-your-gpu-nodes.
After modifying the
/etc/docker/daemon.json
, you need to restart docker so that the configuration works.