Why 8bit quantized fine tuning of a model occupy more memory than just original model fine tuning?
I trying to fine tune Mistral model for 5 epochs. It shows it take 72 hours with 8 bit quantized fine tuning but 48 hours with just original mode fine tuning. Also memory footprint is higher for 8bit quantized fine tuning. Below is the code where I am loading model for 8 bit quantization.
Why 8bit quantized fine tuning of a model occupy more memory than just original model fine tuning?
I trying to fine tune Mistral model for 5 epochs. It shows it take 72 hours with 8 bit quantized fine tuning but 48 hours with just original mode fine tuning. Also memory footprint is higher for 8bit quantized fine tuning. Below is the code where I am loading model for 8 bit quantization.
ImportError: Using the `Trainer` with `PyTorch` requires `accelerate>=0.21.0`
I am trying to fine tune a Bert Pretrained model and I am using Transformers trainer and I use the TrainingArguments to tune some hyperparameters
Transformers: AutoModel from pretrained istantiation error
I’m instantiating a CodeBert Model using AutoModel.fromPretrained.
AttributeError: module ‘torch.utils._pytree’ has no attribute ‘register_pytree_node’
I use pytorch 2.0.0 and transformer4.41.0.