I’m working on a multi-query retrieval system using LangChain and FAISS for a project focused on meditation and spirituality. My current setup involves generating variations of a user query to improve document retrieval.
However, I’m encountering an issue where the retrieved documents are often unrelated to the topic of the query.
For embedding, I’m using the “nomic-embed-text” model. I have embed 335 PDF books about meditation and spirituality.
Here’s the code I’m using for generating query variations and retrieving documents:
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.llms import Ollama
from langchain.retrievers import MultiQueryRetriever
from langchain.schema import StrOutputParser
# Load the FAISS database
new_db = FAISS.load_local("bookdb", embedder, allow_dangerous_deserialization=True)
# Define the prompt template for generating question variations
QUERY_PROMPT = PromptTemplate(
input_variables=["question"],
template="""You are an AI language model assistant. Generate five variations of the following question, maintaining the core meaning but using different phrasings or perspectives. Ensure all variations remain highly relevant to the topic of meditation.
Original question: {question}"""
)
# Initialize the Ollama model
ollama = Ollama(model="llama3.1")
# Function to inspect the state
def inspect(state):
"""Print the state passed between Runnables in a langchain and pass it on"""
print(state)
return state
# Define the chain for generating question variations
question_variation_chain = LLMChain(
llm=ollama,
prompt=QUERY_PROMPT,
output_parser=StrOutputParser(key="lines")
)
# Define the retriever
retriever = MultiQueryRetriever(
retriever=new_db.as_retriever(),
llm_chain=question_variation_chain
)
# Example query
question = "How to start a meditation?"
# Run the retriever
results = retriever.invoke(question)
print("Number of unique documents found:", len(results))
# Process the results to ensure uniqueness and sort by score
unique_results = {r.page_content: r for r in results}.values()
sorted_results = sorted(unique_results, key=lambda x: x.metadata.get('score', 0), reverse=True)
print(f"Number of unique documents found: {len(sorted_results)}")
for i, result in enumerate(sorted_results, 1):
print(f"nResult {i}:")
print(f"Content: {result.page_content[:200]}...") # Print first 200 characters
print(f"Metadata: {result.metadata}")
The generated query variations are relevant, yet the retrieved documents do not seem to match the topic well. For example, with the query “How to start a meditation?”, I get unrelated documents even though all my books are about meditation and spirituality