I’ve been using the llama-cpp-python
library for some time now, and in earlier versions, I could easily check the availability of a GPU by inspecting the GGML_USE_CUBLAS
variable or using the ggml_init_cublas
attribute of the llama
shared library as follows:
# For older versions
from llama_cpp.llama_cpp import GGML_USE_CUBLAS
def is_gpu_available_v1() -> bool:
return GGML_USE_CUBLAS
# For later versions
from llama_cpp.llama_cpp import _load_shared_library
def is_gpu_available_v2() -> bool:
lib = _load_shared_library('llama')
return hasattr(lib, 'ggml_init_cublas')
These approaches seemed related to cuBLAS, and while I’m not exactly sure where ggml_init_cublas
originates from, they sufficed for me as they reliably predicted whether a CUDA-enabled GPU is being used (as indicated by fast responses and high GPU utilization) when the library is installed with CUDACXX=$MY_NVCC_PATH CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade
.
However, in the latest versions of llama-cpp-python
, it seems that the GGML_USE_CUBLAS
variable has been removed, and the hasattr(lib, 'ggml_init_cublas')
consistently evaluates False, regardless of GPU availability.
Could someone please guide me on how to programatically check for GPU availability in the latest version of the llama-cpp-python
library?
Any insights or alternative approaches would be appreciated.