I'm using Kubernetes(K8) v1.7 and wondering if I can share a GPU among multiple pods. I have an MNIST machine learning program in TensorFlow where GPU Allocation is set to 30% and GPU Growth allow_growth
is false. Two manifest files are being used to deploy two separate jobs under K8. When I run them together, one gets scheduled and other waits as the GPU resource is not available. In my manifest file, I set alpha.kubernetes.io/nvidia-gpu
to 0.5. When I check GPU processes, it always shows one process is running. If I stop one job from Kubernetes then other gets scheduled and shows up in the GPU processes. FYI, the machine I'm using has only one GPU. So my question is whether the Kubernetes supports GPU sharing or not? Can I share a GPU and define it in the manifest file?
A portion of the Manifest (both jobs have same GPU request)
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 0.5
The output of the nvidia-smi
command shows 1 process at a time
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26 Driver Version: 396.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:0B:00.0 Off | N/A |
| 0% 33C P2 59W / 275W | 177MiB / 11178MiB | 8% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 15513 C python3 167MiB |
+-----------------------------------------------------------------------------+
GPU sharing on multiple containers is not supported at the moment, and it is unlikely to be supported anytime soon. You would need to have each virtual machine be a separate Kubernetes node, each with a separate GPU.
Official doc of kubernetes says the minimum value which you can request for GPU in pod is 1, not a fraction. You can look into kubeflow project as kubernetes doesn't support sharing a single GPU across the pods.