I have been wanting to move my pytorch project to the GPUs for training. I am working inside a docker container. Thus, I tested torch.cuda.is_available()
wich returned False
.
This is what my dockerfile looks like.
FROM mcr.microsoft.com/devcontainers/python:0-3.11
ENV PATH=/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH}
WORKDIR /app
CMD ["/bin/bash"]
I tried with nvidia-smi
which gave me this
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-PCI... Off | 00000000:01:00.0 Off | 0 |
| N/A 29C P0 31W / 250W | 4MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-PCI... Off | 00000000:41:00.0 Off | 0 |
| N/A 30C P0 34W / 250W | 4MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-PCI... Off | 00000000:81:00.0 Off | 0 |
| N/A 29C P0 32W / 250W | 4MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A100-PCI... Off | 00000000:C1:00.0 Off | 0 |
| N/A 29C P0 33W / 250W | 4MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
But when trying nvcc --version
it returns bash: nvcc: command not found
. This issue does not arise when I type nvcc --version
outside of my container. I thus assume, that it has something to do with my docker container.
Any tips what I might be doing wrong?