Is it necessary for torch_dtype when loading a model and the precision for trainable weights to be different? If so, why?
According to this comment in the huggingface/peft package, if a model is loaded in fp16, the trainable weights must be cast to fp32. From this comment, I understand that generally, the torch_dtype
used when loading a model and the precision used for training must be different. Why is it necessary to change the precision? Also, does this principle apply to both fine-tuning and continual pretraining?
Is it necessary for torch_dtype when loading a model and the precision for trainable weights to be different? If so, why?
According to this comment in the huggingface/peft package, if a model is loaded in fp16, the trainable weights must be cast to fp32. From this comment, I understand that generally, the torch_dtype
used when loading a model and the precision used for training must be different. Why is it necessary to change the precision? Also, does this principle apply to both fine-tuning and continual pretraining?
Is it necessary for torch_dtype when loading a model and the precision for trainable weights to be different? If so, why?
According to this comment in the huggingface/peft package, if a model is loaded in fp16, the trainable weights must be cast to fp32. From this comment, I understand that generally, the torch_dtype
used when loading a model and the precision used for training must be different. Why is it necessary to change the precision? Also, does this principle apply to both fine-tuning and continual pretraining?
How to handle the loss function decrease at some points but bounce back to keep increasing when fine-tune LLaMA?
Preface