Preface
I am new to fine-tuning an LLM model on binary classification tasks. I have tried fine-tuning LLaMA 2 and 3 with the causal inference task (next token prediction) with PEFT LoRA and 4-bit quantization. The task that I have been trained on is using various prompts.
${Text}. ${Question}. ${Answer (Positive class or negative class)}
${Some context}. ${Question}. ${Answer (Positive class or negative class)}
The training arguments are here:
per_device_train_batch_size = 8
gradient_accumulation_steps = 4
optim = "paged_adamw_32bit"
save_steps = 100
logging_steps = 10
learning_rate = 1e-4
max_grad_norm = 0.3 # I have tried to use default max_grad_norm
max_steps = 1000
warmup_ratio = 0.03 # I have tried to use default warmup_ratio
lr_scheduler_type = "cosine_with_restarts" # I have tried it using "cosine" too
Here is the trainer:
trainer = SFTTrainer(
model=model,
args=training_arguments,
train_dataset=dataset,
packing=True,
dataset_text_field="id",
tokenizer=tokenizer,
max_seq_length=1024,
formatting_func=formatting_func,
)
Issues and Questions
The issue comes within the results. The loss function decreases after some steps, but after particular steps, the loss function keeps increasing (I supposed that LLaMA used cross entropy as the loss function). Is the issue within the LLM that I used (LLaMA) or what may cause the problem? and how to solve it.