How to run data science model's docker image on Azure Kubernetes Services (GPU enabled)

9/6/2019

How to run data science model's docker image on Azure Kubernetes Services (GPU enabled) so that it can utilize the GPU capability of the Kubernetes cluster. The packages which we are using to build models are tensorflow, keras, scikit-learn, etc. Do we need to include Cuda installation steps on Dockerfile?

enter image description here

Also find the error below:

enter image description here

-- Chaitanya Kirty
azure-kubernetes
data-science
docker
kubernetes
python

1 Answer

9/6/2019
  1. create aks with vm size that supports gpu
  2. install nvidia drivers
  3. run your workload

for point 2 use the following yaml:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: nvidia-device-plugin-daemonset
  namespace: kube-system
spec:
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      # Mark this pod as a critical add-on; when enabled, the critical add-on scheduler
      # reserves resources for critical add-on pods so that they can be rescheduled after
      # a failure.  This annotation works in tandem with the toleration below.
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      labels:
        name: nvidia-device-plugin-ds
    spec:
      tolerations:
      # Allow this pod to be rescheduled while the node is in "critical add-ons only" mode.
      # This, along with the annotation above marks this pod as a critical add-on.
      - key: CriticalAddonsOnly
        operator: Exists
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
      containers:
      - image: nvidia/k8s-device-plugin:1.11
        name: nvidia-device-plugin-ctr
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
        volumeMounts:
          - name: device-plugin
            mountPath: /var/lib/kubelet/device-plugins
      volumes:
        - name: device-plugin
          hostPath:
            path: /var/lib/kubelet/device-plugins

official docs on this matter. you can also use these official k8s docs on using GPU. I think you should just use a base image that contains GPU drivers, for example you can use the example image MS provides as a base, or something like tensorflow/tensorflow:latest-gpu.

-- 4c74356b41
Source: StackOverflow