K
Q

Question

TensorFlow serving on docker failed call to cuInit: CUresult(-1)

7/25/2018

When using the image tensorflow/serving:latest-devel-gpu on Kuberenetes, the GPU isn't being used.

I don't do anything fancy with it, simply pass server.conf and model files.

The default runtime is nvidia-docker, and my other GPU pod is able to use the GPU.

The only error in the log:

E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:397 ] failed call to cuInit: CUresult(-1)

Something else which is interesting:

I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program

-- aclokay

docker

kubernetes

tensorflow

tensorflow-serving

2 Answers

7/26/2018

There are several issues on the tracker: #394, #2882, #646.

In brief, there are solutions that work well (try one at a time):

Run:

$ sudo apt-get install nvidia-modprobe  
$ sudo reboot

Run:
```
$ nvidia-cuda-mps-server
```

Run the following:

$ sudo modinfo nvidia-<driver_version_num>-uvm (with driver_version_num as 384 in my case)
$ sudo modprobe --force-modversion nvidia-<nvidia-version>-uvm

I was on CUDA-8 and CuDNN-6.0
I moved to CUDA-9 and CuDNN-7.0

As soon as you run Tensorflow as a pod, I can guess that solution 1,2,3 should be applied to the worker node, but for solution 4 you may need to update your Tensorflow docker image.a

-- VAS

Source: StackOverflow

9/23/2018

update you dockerfile with

RUN rm /usr/local/cuda/lib64/stubs/libcuda.so.1

or extend the serving devel gpu image from docker hub, with one extra line

FROM tensorflow/serving:1.9.0-devel-gpu
RUN rm /usr/local/cuda/lib64/stubs/libcuda.so.1

-- Srini Reddy

Source: StackOverflow

KQ

TensorFlow serving on docker failed call to cuInit: CUresult(-1)

Similar Questions

2 Answers

K
Q