I am trying to generate text from my fine-tuned Llama3 model which uses the PEFT AutoPeftModelForCausalLM library while also passing in previous message history.
This is how I am currently generating responses without previous message history:
def get_response(input):
inputs = tokenizer(["""Generate a response to the input
### Input:
{}
### Response:""".format(input)], return_tensors = "pt").to("cuda")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=32)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
return response
And this is how I defined the model
and tokenizer
variables:
model = AutoPeftModelForCausalLM.from_pretrained("huggingface-user/outputs").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("unsloth/llama-3-8b-bnb-4bit")
I want to be able to pass in something like:
previous_messages = [
{"role": "user", "content": message1},
{"role": "assistant", "content": message2},
{"role": "user", "content": message3},
{"role": "assistant", "content": message4},
]
Any help would be greatly appreciated!