Kubernetes scheduling GPU-pods in loadbalanced manner

7/23/2019

There is this kubernetes cluster with n number of nodes where some of the nodes are fitted with multiple NVIDIA 1080Ti GPU cards on it.

I have two kind of pods 1. GPU enabled, these need to be scheduled on GPU fitted nodes where pod will only use one of the GPU cards present on that node. 2. CPU only, now these can be scheduled anywhere, preferably on CPU only nodes.

Scheduling problem is addressed clearly in this answer.

Issue: When scheduling a GPU-enabled pod on a GPU fitted node I want to be able decide on which GPU card among those multiple GPU cards my pod is going to use. Further, I was thinking of a loadbalancer sitting transparently b/w GPU hardware and pods that will decide the mapping.

Any help around this architecture would be deeply appreciated. Thank you!

-- Schrodinger
kubernetes
load-balancing
nvidia

1 Answer

7/23/2019

You have to use Official NVIDIA GPU device plugin rather than suggested by GCE. There's possibility to schedule GPUs by attributes

Pods can specify device selectors based on the attributes that are advertised on the node. These can be specified at the container level. For example:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
    - name: cuda-container
      image: nvidia/cuda:9.0-base
      command: ["sleep"]
      args: ["100000"]
      computeResourceRequests: ["nvidia-gpu"]
  computeResources:
    - name: "nvidia-gpu"
      resources:
        limits:
          nvidia.com/gpu: 1
      affinity:
        required:
          - key: "nvidia.com/gpu-memory"
            operator: "Gt"
            values: ["8000"] # change value to appropriate mem for GPU

Check Kubernetes on NVIDIA GPUs Installation Guide

Hope this will help

-- A_Suh
Source: StackOverflow