I’m trying to set a development environment using Pytorch and nvidia/cuda, but it’s not working.
The following command works as expected and recognizes the GPUs:
docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:24.08-py3
(from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)
(And it takes a few minutes to download and start the container.)
But if I use the following Dockerfile, the GPUs are NOT recognized (I’m trying to build the container using ./devcontainer/Dockerfile in VSCode in Windows 11 with WSL2):
ARG gpus=all
FROM nvcr.io/nvidia/pytorch:24.08-py3
Trying to run
python -c 'import torch; print(torch.cuda.current_device())'
gives this error:
RuntimeError: Found no NVIDIA driver on your system.
Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
The command ‘nvcc –version’ runs ok:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Fri_Jun_14_16:34:21_PDT_2024
Cuda compilation tools, release 12.6, V12.6.20
Build cuda_12.6.r12.6/compiler.34431801_0
But the command ‘nvidia-smi’ returns nothing (and it works normally with ‘docker run’ commnand.)
Can someone, please, give some hint about how to correct this?
1
The correct way to pass the command line argument “gpus” to the container is using the file “devcontainer.json” (and NOT the dockerfile ARGS).
For this, insert the following line in your devcontainer.json:
"runArgs": ["--gpus=all"],