Error while launching a GPU instance with Kubernetes and Docker

11/25/2019

I am trying to create a GPU instance with CUDA installed as a Kubernetes pod using Dockerfile over AWS with Debian OS. I want to specify the GPU configuration in the Dockerfile itself and there is a job that deploys a flask application on an instance with the Dockerfile conf.

PFB the Dockerfile:

FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu16.04
ARG PYTHON_VERSION=3.6
ARG WITH_TORCHVISION=1
RUN apt-get update && apt-get install -y --no-install-recommends \
       build-essential \
         cmake \
         git \
         curl \
         ca-certificates \
         libjpeg-dev \
         libpng-dev && \
     rm -rf /var/lib/apt/lists/*


RUN curl -o ~/miniconda.sh -O  https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh  && \
     chmod +x ~/miniconda.sh && \
     ~/miniconda.sh -b -p /opt/conda && \
     rm ~/miniconda.sh && \
     /opt/conda/bin/conda install -y python=$PYTHON_VERSION numpy pyyaml scipy ipython mkl mkl-include ninja cython typing && \
     /opt/conda/bin/conda install -y -c pytorch magma-cuda100 && \
     /opt/conda/bin/conda clean -ya
ENV PATH /opt/conda/bin:$PATH
# This must be done before pip so that requirements.txt is available
WORKDIR /opt/pytorch
COPY . .

RUN git submodule update --init --recursive
RUN TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1 7.0+PTX" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" \
    CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" \
    pip install -v .

RUN if [ "$WITH_TORCHVISION" = "1" ] ; then git clone https://github.com/pytorch/vision.git && cd vision && pip install -v . ; else echo "building without torchvision" ; fi

WORKDIR /workspace
RUN chmod -R a+w .
FROM python:3.6
WORKDIR /usr/src/app
COPY requirements.txt .
ADD https://fs8.transfernow.net/download/5dd4e47f203f8/master/resnext101_32x4d_True_freeze_True_freeze_initial_layer.pth /usr/src/app
RUN pip install --no-cache-dir -r requirements.txt
COPY flaskdir/ .
COPY tars/ /usr/src/app/tars/
CMD ["python3.6", "flask_server.py"]

The error occurs on Step 10, with pip install -v . It says that 'setup.py' nor 'pyproject.toml' not found :

Step 6/21 : ENV PATH /opt/conda/bin:$PATH
 ---> Running in fce72ddfda8b
Removing intermediate container fce72ddfda8b
 ---> 7ffaa404bc9f
Step 7/21 : WORKDIR /opt/pytorch
 ---> Running in bd1a87951a0f
Removing intermediate container bd1a87951a0f
 ---> a20e06b1f368
Step 8/21 : COPY . .
 ---> 614cebaad94c
Step 9/21 : RUN git submodule update --init --recursive
 ---> Running in 13cc38c14a28
Removing intermediate container 13cc38c14a28
 ---> e9f7604030d6
Step 10/21 : RUN TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1 7.0+PTX" TORCH_NVCC_FLAGS="-Xfatbin -compress-all"     CMAKE_PREFIX_PATH="$(dirname $(which conda))/../"     pip install -v .
 ---> Running in 92c5ae38a4c2
Created temporary directory: /tmp/pip-ephem-wheel-cache-e_0wcpcl
Created temporary directory: /tmp/pip-req-tracker-7gp_qslv
Created requirements tracker '/tmp/pip-req-tracker-7gp_qslv'
Created temporary directory: /tmp/pip-install-o4hu8ioz
Cleaning up...
Removed build tracker '/tmp/pip-req-tracker-7gp_qslv'
[91mERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.
[0mException information:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 153, in _main
    status = self.run(options, args)
  File "/opt/conda/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 362, in run
    wheel_cache
  File "/opt/conda/lib/python3.6/site-packages/pip/_internal/cli/req_command.py", line 238, in populate_requirement_set
    wheel_cache=wheel_cache
  File "/opt/conda/lib/python3.6/site-packages/pip/_internal/req/constructors.py", line 395, in install_req_from_line
    parts = parse_req_from_line(name, line_source)
  File "/opt/conda/lib/python3.6/site-packages/pip/_internal/req/constructors.py", line 324, in parse_req_from_line
    url = _get_url_from_path(p, name)
  File "/opt/conda/lib/python3.6/site-packages/pip/_internal/req/constructors.py", line 280, in _get_url_from_path
    "nor 'pyproject.toml' found." % name
pip._internal.exceptions.InstallationError: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.

Is there a mistake in the Dockerfile commands?

-- Prabhjot
docker
kubernetes
pip
python
pytorch

0 Answers