I have been experimenting with doing PEFT with LoRA on some HuggingFace models. After performing many experiments and documenting them, I noticed that I did not specify the type of quantization when defining a BitsAndBytesConfig as follows:
quantization_config = BitsAndBytesConfig(load_in_8bit_fp32_cpu_offload=True)
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.float16,
quantization_config=quantization_config,
device_map="auto",
)
I checked the default values of the BitsAndBytesConfig, and I found that the default values for load_in_8bit and load_in_4bit are set to False. What happens in this case? Is quantization used?