Relative Content

Tag Archive for pythonpytorchnlphuggingface-transformerslarge-language-model

Is it necessary for torch_dtype when loading a model and the precision for trainable weights to be different? If so, why?

According to this comment in the huggingface/peft package, if a model is loaded in fp16, the trainable weights must be cast to fp32. From this comment, I understand that generally, the torch_dtype used when loading a model and the precision used for training must be different. Why is it necessary to change the precision? Also, does this principle apply to both fine-tuning and continual pretraining?

Is it necessary for torch_dtype when loading a model and the precision for trainable weights to be different? If so, why?

According to this comment in the huggingface/peft package, if a model is loaded in fp16, the trainable weights must be cast to fp32. From this comment, I understand that generally, the torch_dtype used when loading a model and the precision used for training must be different. Why is it necessary to change the precision? Also, does this principle apply to both fine-tuning and continual pretraining?

Is it necessary for torch_dtype when loading a model and the precision for trainable weights to be different? If so, why?

According to this comment in the huggingface/peft package, if a model is loaded in fp16, the trainable weights must be cast to fp32. From this comment, I understand that generally, the torch_dtype used when loading a model and the precision used for training must be different. Why is it necessary to change the precision? Also, does this principle apply to both fine-tuning and continual pretraining?