Prefaces
I have successfully fine-tuned a LLaMA 2 model using 4-bit quantization with bits and bytes library. And now I want to load it from the hugging face remote model. I have this device and driver specifications (Using Google VM):
nvidia-smi
nvcc –version
Implementation
finetuned_model_id = "ferguso/llama-2-13b-detect"
quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)
finetuned_model = AutoModelForCausalLM.from_pretrained(
finetuned_model_id,
quantization_config=quantization_config,
)
Version: torch==1.13.0
, bitsandbytes==0.43.1
Issue and Question
I am trying to run and load the model but it shows error:
AttributeError: ‘NoneType’ object has no attribute ‘cquantize_blockwise_fp16_fp4’
What cause this error and how to fix it?