I have a k8s cluster with one master and one worker. Worker has GPUs, and I am able to schedule pods on it using GPUs.
I add the second worker, also with GPUs. Everything is properly installed on that second worker. When I try to schedule GPU busybox on that second worker I get status UnexpectedAdmissionError
.
With kubectl describe pod busybox
I see Warning UnexpectedAdmissionError 7m kubelet, wikiserver Update plugin resources failed due to requested number of devices unavailable for nvidia.com/gpu. Requested: 1, Available: 0, which is unexpected.
which is odd since with kubectl describe nodes second-worker
I see that both Capacity and Allocatable are nvidia.com/gpu: 1
I was able to successfully schedule GPU busybox on first worker. Also, I was able to schedule CPU busybox on both workers.
This is yaml conf for GPU busybox:
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- image: busybox
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
name: busybox
resources:
limits:
nvidia.com/gpu: 1
restartPolicy: Always
nodeName: secondworker
The second worker (and first worker and the master) is configured as described by nvidia-device-plugin.