Kubernetes implicitly uses Nvidia files in container

11/20/2019

I have a docker image and I want to deploy it in Kubernetes. The image is based on nvidia/cuda:10.0-base. One command of entrypoint is rm -r /usr (yes, this command raises questions, but it is needed).

Container works great when I run it on docker. I am sure that entrypoint is correctly and completely executed. But when I try to deploy this image on my k8s, container crushes with following error:

rm: cannot remove '/usr/bin/nvidia-smi': Device or resource busy
rm: cannot remove '/usr/bin/nvidia-persistenced': Device or resource busy
rm: cannot remove '/usr/bin/nvidia-cuda-mps-server': Device or resource busy
rm: cannot remove '/usr/bin/nvidia-cuda-mps-control': Device or resource busy
rm: cannot remove '/usr/bin/nvidia-debugdump': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.430.26': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.430.26': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.410.104': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.430.26': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.430.26': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.430.26': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.410.104': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libcuda.so.410.104': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.430.26': Device or resource busy
rm: cannot remove '/usr/lib/x86_64-linux-gnu/libcuda.so.430.26': Device or resource busy

I successfully deployed this container with different entrypoint and got to shell of this container with kubectl exec -it. When I try to remove, for example, /usr/bin/nvidia-smi same Device or resource busy error is raised.

Neither top nor lsof show processes that use /usr/bin/nvidia-smi or any of the other files listed above.

top output:

      1 root      20   0    4636    848    768 S   0.0  0.0   0:00.05 sh                                                                                                                                                                                                                  
     19 root      20   0   72304   5860   5096 S   0.0  0.0   0:00.00 sshd                                                                                                                                                                                                                
     25 root      20   0   21540   4056   3456 S   0.0  0.0   0:00.09 bash                                                                                                                                                                                                                
    447 root      20   0   39512   3740   3196 R   0.0  0.0   0:00.00 top

How k8s can influence work of container?

-- Fedor
bash
docker
kubernetes
linux
nvidia

1 Answer

11/21/2019

Answer:

Files listed in question as busy were added to container by Kubernetes.

-- Fedor
Source: StackOverflow