I am running GPU instance on GKE when everything is deployed I make the request to the service Above mention error occur I followed all the step in mentioned in https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#ubuntu This is my DockerFile
FROM nvidia/cuda:10.2-cudnn7-devel
# install nginx
# RUN apt-get update && apt-get install nginx vim -y --no-install-recommends
# RUN ln -sf /dev/stdout /var/log/nginx/access.log \
# && ln -sf /dev/stderr /var/log/nginx/error.log
## Setup
RUN mkdir -p /opt/app
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
python3-dev \
python3-pip \
python3-wheel \
python3-setuptools && \
rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*
RUN pip3 install --no-cache-dir -U install setuptools pip
RUN pip3 install --no-cache-dir cupy_cuda102==8.0.0rc1 scipy optuna
COPY requirements.txt start.sh run.py uwsgi.ini utils.py /opt/app/
COPY shading_characteristics /opt/app/shading_characteristics
WORKDIR /opt/app
RUN pip install -r requirements.txt
RUN pip install --upgrade 'sentry-sdk[flask]'
RUN pip install uwsgi -I --no-cache-dir
EXPOSE 5000
## Start the server, giving permissions for script
# COPY nginx.conf /etc/nginx
RUN chmod +x ./start.sh
RUN chmod -R 777 /root
CMD ["./start.sh"]
Edit (May 2021)
GKE now officially supports NVIDIA driver version 450.102.04
, which support CUDA 10.2
.
Please note that GKE 1.19.8-gke.1200 and higher is required.
As you can see in Nvidia's website, CUDA 10.2
requires Nvidia driver version >= 440.33.
Since the latest Nvidia driver available officially in GKE is 418.74
, the newest CUDA
version you can use is 10.1
at the moment.
If your application, or other dependencies such as PyTorch, can function properly with CUDA 10.1
, the fastest solution will be to downgrade your base Docker image with CUDA 10.1
.
There are unofficial ways to install newer Nvidia Driver versions on GKE nodes running COS, but if it's not a must for you - I'd stick to the official and supported GKE method and use 10.1.