Sharing GPU to multiple containers in Kubernetes or Fractional GPU Resource Request

6/13/2018

I'm using Kubernetes(K8) v1.7 and wondering if I can share a GPU among multiple pods. I have an MNIST machine learning program in TensorFlow where GPU Allocation is set to 30% and GPU Growth allow_growth is false. Two manifest files are being used to deploy two separate jobs under K8. When I run them together, one gets scheduled and other waits as the GPU resource is not available. In my manifest file, I set alpha.kubernetes.io/nvidia-gpu to 0.5. When I check GPU processes, it always shows one process is running. If I stop one job from Kubernetes then other gets scheduled and shows up in the GPU processes. FYI, the machine I'm using has only one GPU. So my question is whether the Kubernetes supports GPU sharing or not? Can I share a GPU and define it in the manifest file?

A portion of the Manifest (both jobs have same GPU request)

  resources:
    limits:
      alpha.kubernetes.io/nvidia-gpu: 0.5

The output of the nvidia-smi command shows 1 process at a time

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:0B:00.0 Off |                  N/A |
|  0%   33C    P2    59W / 275W |    177MiB / 11178MiB |      8%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     15513      C   python3                                      167MiB |
+-----------------------------------------------------------------------------+ 
-- Abu Shoeb
gpu
kubernetes
mnist
nvidia
tensorflow

2 Answers

6/13/2018

GPU sharing on multiple containers is not supported at the moment, and it is unlikely to be supported anytime soon. You would need to have each virtual machine be a separate Kubernetes node, each with a separate GPU.

-- Zac R.
Source: StackOverflow

2/12/2019

Official doc of kubernetes says the minimum value which you can request for GPU in pod is 1, not a fraction. You can look into kubeflow project as kubernetes doesn't support sharing a single GPU across the pods.

-- iamvishnuks
Source: StackOverflow