I’m trying to use the pre-trained Meta-Llama-3-8B-Instruct LLM from Hugging Face for fine tuning on my own data. As a very first step, I’m just trying to interact with the model as is.
My system specs:
Brand HP Spectre x360 - 15-df0013dx
Processor Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz 1.99 GHz
Installed RAM 16.0 GB (15.7 GB usable)
OS Windows 10 64-bit Home Version 22H2
OS build 19045.4529
This is my app.py
import os
import torch
import logging
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
try:
# Configure logging
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s - %(levelname)s - %(message)s')
# Ensure the path works on both Windows and Linux
model_path = os.path.join(os.path.dirname(__file__), "results", "Meta-Llama-3-8B-Instruct")
# Verify the model_path type
if not isinstance(model_path, (str, os.PathLike)):
raise ValueError("model_path must be a string or os.PathLike object")
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path, device_map="auto", torch_dtype=torch.float16
)
model.generation_config.pad_token_id = model.generation_config.eos_token_id
generation_config = GenerationConfig.from_pretrained(model_path)
generation_config.max_new_tokens = 128
generation_config.repetition_penalty = 1.18
generation_config.temperature = 0.0000001
text_input = "The most important person in AI is"
encoding = tokenizer(text_input, return_tensors="pt")
encoding = encoding.to(model.device)
with torch.no_grad():
outputs = model.generate(
input_ids=encoding.input_ids,
attention_mask=encoding.attention_mask,
generation_config=generation_config,
)
output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(output_text)
except Exception as e:
logging.error(f"Error generating response: {e}")
Upon running app.py, I get an error regarding tensor size:
Error generating response: Sizes of tensors must match except in
dimension 1. Expected size 1 but got size 2 for tensor number 1 in the
list.
(bc_env) C:Usersraulmllmbc> python .app.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 4/4 [00:29<00:00, 7.33s/it]
2024-07-08 20:53:37,708 - WARNING - Some parameters are on the meta device device because they were offloaded to the cpu and disk.
2024-07-08 20:55:35,675 - ERROR - Error generating response: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list.
For reference, I’m following this tutorial that is actually using Llama2 (if that matters at all)
https://www.mlexpert.io/bootcamp/llms-101
Any help is appreciated.
Thanks.