I am trying to make a gradio chatbot in Hugging Face Spaces using Mistral-7B-v0.1 model. As this is a large model, I have to quantize, else the free 50G storage gets full. I am using bitsandbytes to do so, but I get an Import Error.
This is the HF Space url – text
Traceback –
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
Traceback (most recent call last):
File "/home/user/app/app.py", line 15, in <module>
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", quantization_config=quantization_config, device_map="auto", token=access_token)
File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3165, in from_pretrained
hf_*********.validate_environment(
File "/usr/local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 62, in validate_environment
raise ImportError(
ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`
Note – I am using the free CPU with 16GB RAM, so torch isn’t compiled with GPU
I have added both accelerate and bitsandbytes in requirements.txt (text)
I have also tried changing bitsandbytes to bitsandbytes==0.43.1 (which I think is the latest version), but it didn’t solve the problem.
Below is the full code (app.py)
import os
import bitsandbytes as bnb
import torch
import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
access_token = os.environ["GATED_ACCESS_TOKEN"]
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16",
)
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", quantization_config=quantization_config, device_map="auto", token=access_token)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
# Function to generate text using the model
def generate_text(prompt):
text = prompt
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Create the Gradio interface
iface = gr.Interface(
fn=generate_text,
inputs=[
gr.inputs.Textbox(lines=5, label="Input Prompt"),
],
outputs=gr.outputs.Textbox(label="Generated Text"),
title="MixTRAL 8x22B Text Generation",
description="Use this interface to generate text using the MixTRAL 8x22B language model.",
)
# Launch the Gradio interface
iface.launch()
AnishMt is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.