Hi Langchain newb here !
I am tasked with setting up an AI assistant for an app of a fake theater, let’s call it SignStage, that has two Halls A and B and each play is staged twice a day in the specific hall, once in the afternoon and once at night for a specific predetermined date range.
I want my assistant to be able when prompted to help the Users of the app regarding information about each plays, price, date and time, ticket availability and viewer age limit. Also I would like for the Assistant to be able to do certain actions when prompting by the user, like book, cancel tickets , make a complaint, maybe rate a play…. (although it’s not such a priority right now cause the basic part of the retrieval does not seem to be working) I have thought of tool/function calling for this, but I think that everything related to that is behind an API paywall, please correct me if I am wrong. (I have seen that the Cohere API has a free API Access Token but it’s use is pretty limited, I have used it before on another project).
Sorry for the rumbling…
As I said before I am using the Mistral 7 B instruct Model as my main chat model and I have created the following chain: ( I followed an article on Medium that explained this chain of thought)
I have created the following two pipelines that I chain together:
standalone_query_generation_pipeline = pipeline(
model=mistral_model,
tokenizer=tokenizer,
task="text-generation",
temperature=0.0,
do_sample=False,
#top_p=0.0,
repetition_penalty=1.1,
return_full_text=False,
max_new_tokens=1000,
)
standalone_query_generation_llm = HuggingFacePipeline(pipeline=standalone_query_generation_pipeline)
response_generation_pipeline = pipeline(
model=mistral_model,
tokenizer=tokenizer,
task="text-generation",
#temperature=0.2,
do_sample=True,
#top_p=0.0,
repetition_penalty=1.1,
return_full_text=False,
max_new_tokens=5000,
)
response_generation_llm = HuggingFacePipeline(pipeline=response_generation_pipeline)
Then I chain them together like this , while also creating the memory for the chat history and combining the Retriever (which I mention below)
DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(template="{page_content}")
def _combine_documents(docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="nn"):
doc_strings = [format_document(doc, document_prompt) for doc in docs]
return document_separator.join(doc_strings)
# Instantiate ConversationBufferMemory
memory = ConversationBufferMemory(
return_messages=True, output_key="answer", input_key="question"
)
# First we add a step to load memory
# This adds a "memory" key to the input object
loaded_memory = RunnablePassthrough.assign(
chat_history=RunnableLambda(memory.load_memory_variables) | itemgetter("history"),
)
# Now we calculate the standalone question
standalone_question = { # The rephrased question to be used for the final prompt and to query the DB for relevant documents
"standalone_question": {
"question": lambda x: x["question"], # Original Question
"chat_history": lambda x: get_buffer_string(x["chat_history"]), # Chat History to be used as context to rephrase the User's Question to query the DB
}
| CONDENSE_QUESTION_PROMPT
| standalone_query_generation_llm,
}
# Now we retrieve the documents
retrieved_documents = {
"docs": itemgetter("standalone_question") | retriever, # Retrieved Documents go here
"question": lambda x: x["standalone_question"], # Rephrased Question goes here
}
# Now we construct the inputs for the final prompt
final_inputs = {
"context": lambda x: _combine_documents(x["docs"]), # Retrieved Documents have been formatted
"question": itemgetter("question"), # Final Version of the Question
}
# And finally, we do the part that returns the answers
answer = {
"answer": final_inputs | ANSWER_PROMPT | response_generation_llm, # The answer to be generated from the LLM based on the context and the last version of the question.
"question": itemgetter("question"),
"context": final_inputs["context"]
}
final_chain = loaded_memory | standalone_question | retrieved_documents | answer
I Invoke the chain through this function :
def call_conversational_rag(question, chain, memory):
"""
Calls a conversational RAG (Retrieval-Augmented Generation) model to generate an answer to a given question.
This function sends a question to the RAG model, retrieves the answer, and stores the question-answer pair in memory
for context in future interactions.
Parameters:
question (str): The question to be answered by the RAG model.
chain (LangChain object): An instance of LangChain which encapsulates the RAG model and its functionality.
memory (Memory object): An object used for storing the context of the conversation.
Returns:
dict: A dictionary containing the generated answer from the RAG model.
"""
# Prepare the input for the RAG model
inputs = {"question": question}
# Invoke the RAG model to get an answer
result = chain.invoke(inputs)
print(result)
# Save the current question and its answer to memory for future context
memory.save_context(inputs, {"answer": result["answer"]})
# Return the result
return result
Maybe I should include the templates I am using for each step in the chain:
prompt_template = """
[INST]
Given a conversation and a follow-up question, rephrase the follow-up question to be a standalone question, in its original language, that can be used to query an index containing information about all the plays being staged on Sign Stage. This query will be used to retrieve documents/play info with additional context. Each play has the following info: Title, Genre, Runtime, Description, Headline, Age limit, Tickets Sold, Tickets Available, Dates and Time.
The following are just EXAMPLES, this is NOT the actual chat histoy:
If you do not see any chat history, just return the "follow-up" question as is:
Chat History:
Follow Up Input: What are the plays being staged in Hall A this week?
Standalone Question: What are the plays being staged in Hall A this week?
If this is the second question onwards (meaning that it's indeed a follow-up question), you should properly rephrase the question, leveraging the context that the chat history has to offer.
Chat History:
Human: What are the plays being staged in Hall A this week?
AI: This week, Hall A is featuring the musical “Wicked” from Monday to Saturday, with afternoon and evening performances each day.
Follow Up Input: What are the ticket prices for it?
Standalone Question: What are the ticket prices for the musical Wicked in Hall A at Sign Stage?
NOW take the ACTUAL chat history and input question and do as instructed:
Chat History:
{chat_history}
Follow Up Input: {question}
Standalone Question:
[your response here]
[/INST]
"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(prompt_template)
template = """
[INST]
Preamble:
You are a polite and helpful AI assistant for a theater called Sign Stage. Your role is to assist users in booking tickets, providing information about the plays, canceling reservations, and handling general inquiries related to the theater. You will retrieve all the information that you need about the plays being staged on Sign Stage by quering the index.
Sign Stage has two halls: Hall A and Hall B. Each hall hosts a specific play every day, with one afternoon performance and one night performance for a specific date range. All the ticket prices are in euros (EUR).Keep in mind that the Hall A also offers sign language interpreters, while both halls offer supertitles for the hearing impaired people to read what is being said by the actors. All the plays are performed in Greek.
When responding do not mention that you retrieved any context to the user, just say 'Based on my knowledge'. Decline to answer questions unrelated to the theater Sign Stage, it's plays and their information. If you don't know the answer, just say that you don't know, don't try to mkae up an answer. Maybe ask for more context or prompt the user to try again.
Respond based on the information provided above and the retrieved context below (information about plays to be staged on Sign Stage) and on your knowledge of Sign Stage:
{context}
Question: {question}
Your helpful answer here:
[/INST]
"""
ANSWER_PROMPT = ChatPromptTemplate.from_template(template)
When i ask a question to my model just using the generate function it behaves pretty lucid.
I did that to check that the quantization I did to load it did not kill its accuracy.
model_name='mistralai/Mistral-7B-Instruct-v0.1'
model_config = transformers.AutoConfig.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
use_4bit = True
bnb_4bit_compute_dtype = torch.bfloat16
bnb_4bit_quant_type = "nf4"
use_nested_quant = False
bnb_config = BitsAndBytesConfig(
load_in_4bit=use_4bit,
bnb_4bit_quant_type=bnb_4bit_quant_type,
bnb_4bit_compute_dtype=bnb_4bit_compute_dtype,
bnb_4bit_use_double_quant=use_nested_quant,
)
mistral_model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config, device_map="cuda")
My two biggest problems are that the way I have built the index, when I do a search, I do not get back the most relevant documents or at least not all of them and I do not know. And sometimes even if I do Mistral maybe gets overwhelmed because of the prompt becoming large.
The fact is that I have seen other people do this with more difficult documents and work respectably without much effort . I don’t know what I am missing that much.
For my Documents :
I created using Claude a json file containing information about 21 plays in this format:
[
{
"title": "The Time Traveler's Wife",
"headline": "A Timeless Love Story That Transcends All Boundaries",
"description": " Based on the best-selling novel, this touching drama follows the lives of Henry and Clare, a couple whose love defies the constraints of time itself. As Henry struggles with a rare genetic disorder that causes him to unpredictably travel through time, Clare must learn to navigate the complexities of their unique relationship.",
"cast": [
"Emily Blunt",
"John Krasinski",
"Theo James"
],
"genre": "Drama",
"runtime": "2h 10min",
"dateRange": {
"start": "2024-06-26",
"end": "2024-07-06"
},
"afternoon": {
"time": "18:00",
"price": 35,
"special_price": 18
},
"night": {
"time": "22:00",
"price": 45,
"special_price": 23
},
"regularTickets": {
"available": 220,
"sold": 180
},
"specialNeedsTickets": {
"available": 11,
"sold": 8
},
"totalTickets": {
"available": 231,
"sold": 188
},
"hall": "Hall A",
"ageLimit": "18+"
},
{
"title": "The Phantom of the Opera",
"headline": "The Legendary Musical That Will Haunt Your Soul",
"description": "Andrew Lloyd Webber's iconic musical tells the haunting tale of a disfigured musical genius who haunts the Paris Opera House and falls madly in love with a young soprano. With its timeless score and powerful story, this production is a must-see for all theater lovers.",
"cast": [
"Gerard Butler",
"Emmy Rossum",
"Patrick Wilson"
],
"genre": "Musical",
"runtime": "2h 30min",
"dateRange": {
"start": "2024-06-26",
"end": "2024-07-06"
},
"afternoon": {
"time": "18:00",
"price": 40,
"special_price": 20
},
"night": {
"time": "22:00",
"price": 50,
"special_price": 25
},
"regularTickets": {
"available": 150,
"sold": 120
},
"specialNeedsTickets": {
"available": 8,
"sold": 5
},
"totalTickets": {
"available": 158,
"sold": 125
},
"hall": "Hall B",
"ageLimit": "16+"
},
...
]
I have also tried using a JSONLoader to load the records straight into documents But I decided against the Idea as i thought it would be more beneficial for the LLM for the info to be in coherent text:
import json
# Load your JSON data into a Python dictionary
with open('../input/playsdb/playsDB.json', 'r') as f:
data = json.load(f)
For each play in the JSON I create a string representation of the information it contains and a dictionary containing metadata (although I have not understood if and how I can use them in my search)
# Generate documents and metadata
info = generate_play_descriptions_and_metadata(data)
documents = info[0]
metadata = info[1]
I am using the RecursiveCharacter splitter as I read it is considered the best for starting with generic text.
I have also thought of chunking my data like this : One chunk -> One play_description, a bit less that 1205 characters. So that means that each Document will be One Play.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1205, chunk_overlap=0, length_function=len)
chunks=[]
for doc in documents:
chunks.append(text_splitter.split_text(doc))
print(len(chunks))
docs=[]
# Add Metadata to play
for idx, chunk in enumerate(chunks):
docs.append(Document(page_content=chunk[0], metadata=metadata[idx]))
I used the following VectorStore with that embedding function. What search function should I use? similarity search with cosine similarity, Euclidean Distance or MMr for my data ?
# Load Embedding Model
model_kwargs = {'device': 'cuda'}
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity
embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-large-en-v1.5", model_kwargs=model_kwargs, encode_kwargs=encode_kwargs) # BAAI/bge-large-en-v1.5 | sentence-transformers/all-mpnet-base-v2 | all-MiniLM-L6-v2
# Create Index
db = Chroma.from_documents(docs, embeddings)
# Instantiate a Retriever to query the index
retriever = db.as_retriever(k=5)
#retriever = db.as_retriever(search_type="similarity_score_threshold", search_kwargs={'score_threshold': 0.1,'k':21}
document’s page_content is created like this:
def generate_play_description(doc): # Generate a detailed description of a play based on the provided dictionary.
title = doc["title"]
headline = doc["headline"]
description = doc["description"]
cast = ", ".join(doc["cast"])
genre = doc["genre"]
runtime = doc["runtime"]
date_range = f"{doc['dateRange']['start']} - {doc['dateRange']['end']}"
afternoon_time = doc["afternoon"]["time"]
afternoon_price = doc["afternoon"]["price"]
afternoon_special_price = doc["afternoon"]["special_price"]
night_time = doc["night"]["time"]
night_price = doc["night"]["price"]
night_special_price = doc["night"]["special_price"]
regular_tickets_available = doc["regularTickets"]["available"]
regular_tickets_sold = doc["regularTickets"]["sold"]
special_needs_tickets_available = doc["specialNeedsTickets"]["available"]
special_needs_tickets_sold = doc["specialNeedsTickets"]["sold"]
total_tickets_available = doc["totalTickets"]["available"]
total_tickets_sold = doc["totalTickets"]["sold"]
hall = doc["hall"]
age_limit = doc["ageLimit"]
play_description = f""" Information about play with title: {title}
Headline: {headline}
Description: {description}
Cast: {cast}
Genre: {genre}
Runtime (Duration) of the play is {runtime} .
Performed two times per day for each day in this Date Range: {date_range} .
Showtimes:
- For the afternoon timezone, the play starts at {afternoon_time} and the regular tickets are priced at {afternoon_price} €, while the special needs tickets are priced at {afternoon_special_price}
- For the night timezone, the play starts at {night_time} and the regular tickets are priced at {night_price} €, while the special needs tickets are priced at {night_special_price}
Ticket Availability:
Regular Tickets:
- Available: {regular_tickets_available}
- Sold: {regular_tickets_sold}
Special Needs Tickets:
- Available: {special_needs_tickets_available}
- Sold: {special_needs_tickets_sold}
Total Tickets:
- Available: {total_tickets_available}
- Sold: {total_tickets_sold}
Performed only on {hall} .
Age Limit is {age_limit}. So this play is allowed for individuals that are {str.replace(age_limit, "+", "")} or older.
"""
return play_description
Guys I am stuck for 5 days on this and everything I have tried does not seem to work. Am I doing some completely newbie mistake that crashes my whole setup ?
If you have read this far down , thank you for your patience.
I wasn’t sure what information to put in, I just though that all of this is crucial and somewhere here lies the mistake ( I hope). So sorry for the chaos …