model fine-tuning, vanishing gradient problem
I am fine-tuning a mistral-7b
with Hugging Face peft
and quantization. In my training loop, I am printing the gradient values for each batch which seem a bit unusual.
I am fine-tuning a mistral-7b
with Hugging Face peft
and quantization. In my training loop, I am printing the gradient values for each batch which seem a bit unusual.