I’m trying to save my model so it won’t need to re-download the base model every time I want to use it but nothing seems to work for me, I would love your help with it.
The following parameters are used for the training:
hf_model_name = "tiiuae/falcon-7b-instruct"
dir_path = 'Tiiuae-falcon-7b-instruct'
model_name_is = f"peft-training"
output_dir = f'{dir_path}/{model_name_is}'
logs_dir = f'{dir_path}/logs'
model_final_path = f"{output_dir}/final_model/"
EPOCHS = 3500
LOGS = 1
SAVES = 700
EVALS = EPOCHS / 100
compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=False,
)
model = AutoModelForCausalLM.from_pretrained(
"tiiuae/falcon-7b-instruct",
quantization_config=bnb_config,
device_map={"": 0},
trust_remote_code=False
)
peft_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.05, # 0.1
r=64,
bias="lora_only", # none
task_type="CAUSAL_LM",
target_modules=[
"query_key_value"
],
)
model.config.use_cache = False
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-7b-instruct", trust_remote_code=False)
tokenizer.pad_token = tokenizer.eos_token
training_arguments = TrainingArguments(
output_dir=output_dir,
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
optim='paged_adamw_32bit',
max_steps=EPOCHS,
save_steps=SAVES,
logging_steps=LOGS,
logging_dir=logs_dir,
eval_steps=EVALS,
evaluation_strategy="steps",
fp16=True,
learning_rate=0.001,
max_grad_norm=0.3,
warmup_ratio=0.15, # 0.03
lr_scheduler_type="constant",
disable_tqdm=True,
)
model.config.use_cache = False
trainer = SFTTrainer(
model=model,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
peft_config=peft_config,
dataset_text_field="text",
max_seq_length=448,
tokenizer=tokenizer,
args=training_arguments,
packing=True,
)
for name, module in trainer.model.named_modules():
if "norm" in name:
module = module.to(torch.float32)
train_result = trainer.train()
And the saving of it I did like so:
metrics = train_result.metrics
max_train_samples = len(train_dataset)
metrics["train_samples"] = min(max_train_samples, len(train_dataset))
# save train results
trainer.log_metrics("train", metrics)
trainer.save_metrics("train", metrics)
# compute evaluation results
metrics = trainer.evaluate()
max_val_samples = len(eval_dataset)
metrics["eval_samples"] = min(max_val_samples, len(eval_dataset))
# save evaluation results
trainer.log_metrics("eval", metrics)
trainer.save_metrics("eval", metrics)
model.save_pretrained(model_final_path)
Now I’ve tried so many different ways to load it or load and save it in various ways again and again (for example adding lora_model.merge_and_unload()
, plain using local_model = AutoModelForCausalLM.from_pretrained(after_merge_model_path)
and more), but nothing seems to work for me everything resulted in errors (sometimes the same errors, sometimes different ones), I need your help here.
If you think its better suited, I opened a question here too HuggingFace Forum
1
The fine-tuning is done by training adapters on top of the base model. And after the training you only save the adapter, not the base model. So the workflow is the following:
During training:
- you download the base model from HF and save it in cache directory
- you train PEFT adapter and save it
During inferencing
- Load cached HF base model
- Load saved peft adapter and apply it to the base model
Step 1. Download HF model in predefined cached directory:
import os
# set cache for pretrained model
os.environ['HF_HOME'] = '/content/assets/hf_cache/'
os.environ['HF_DATASETS_CACHE'] = '/content/assets/hf_datasets/'
hf_model_name = "tiiuae/falcon-7b-instruct"
# load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(hf_model_name,
trust_remote_code=False)
tokenizer.pad_token = tokenizer.eos_token
# load the model
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=False,
)
model = AutoModelForCausalLM.from_pretrained(
hf_model_name,
quantization_config=bnb_config,
device_map={"": 0},
trust_remote_code=False
)
...
After the training save peft adapters:
... train the model...
train_result = trainer.train()
from pathlib import Path
dir_path = Path('/content')
adapter_final_path = dir_path / f"output" / "final_adapter"
model.save_pretrained(adapter_final_path)
During inference reload base model and peft adapter:
from peft import LoraConfig, PeftModel
from pathlib import Path
dir_path = Path('/content')
adapter_final_path = dir_path / f"output" / "final_adapter"
hf_model_name = "tiiuae/falcon-7b-instruct"
# load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(hf_model_name,
trust_remote_code=False)
tokenizer.pad_token = tokenizer.eos_token
# load the model
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=False,
)
model = AutoModelForCausalLM.from_pretrained(
hf_model_name,
quantization_config=bnb_config,
device_map={"": 0},
trust_remote_code=False
)
# apply saved adapter to the model
model.load_adapter(adapter_final_path)
1