I am running inference on vast.ai with NVIDIA-SMI 560.28.03 and CUDA Version: 12.6. I am using llama.cpp to run a GGUF version of Mistral. My code, when I run it, uses only CPU.
Any help is appreciated that will make my code run on GPU.
I have the following mentioned in the code too – “n_gpu_layers”: -1, “n_ctx”: 2048
I printed the utilisation and here is what i get
GPU Total Memory: 40960.00 MB
GPU Used Memory: 1016.62 MB
GPU Free Memory: 39943.38 MB
GPU Utilization: 0%
GPU Memory Utilization: 0%
But its still only using CPU.