When using the image tensorflow/serving:latest-devel-gpu
on Kuberenetes, the GPU isn't being used.
I don't do anything fancy with it, simply pass server.conf and model files.
The default runtime is nvidia-docker, and my other GPU pod is able to use the GPU.
The only error in the log:
E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:397 ] failed call to cuInit: CUresult(-1)
Something else which is interesting:
I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
There are several issues on the tracker: #394, #2882, #646.
In brief, there are solutions that work well (try one at a time):
Run:
$ sudo apt-get install nvidia-modprobe
$ sudo reboot
Run:
$ nvidia-cuda-mps-server
Run the following:
$ sudo modinfo nvidia-<driver_version_num>-uvm (with driver_version_num as 384 in my case)
$ sudo modprobe --force-modversion nvidia-<nvidia-version>-uvm
I was on CUDA-8 and CuDNN-6.0
I moved to CUDA-9 and CuDNN-7.0
As soon as you run Tensorflow as a pod, I can guess that solution 1
,2
,3
should be applied to the worker node, but for solution 4
you may need to update your Tensorflow docker image.a
update you dockerfile with
RUN rm /usr/local/cuda/lib64/stubs/libcuda.so.1
or extend the serving devel gpu image from docker hub, with one extra line
FROM tensorflow/serving:1.9.0-devel-gpu
RUN rm /usr/local/cuda/lib64/stubs/libcuda.so.1