High VRAM usage after loading peft weights with the base model
I have fine-tuned llama-3 using qlora technique and wanted to do inference.
I use the below code for inference.
High VRAM usage after loading peft weights with the base model
I have fine-tuned llama-3 using qlora technique and wanted to do inference.
I use the below code for inference.