I am currently working with SelfQueryRetriever and my data is stored in a ChromaDB server within a collection. While a simple similarity search retrieves answers correctly, using SelfQueryRetriever for the same collection and query returns empty results.
I have initialized SelfQueryRetriever and its parameters properly, and it correctly identifies the target collection. Despite this, it still returns no results. My libraries are up-to-date. Additionally, when I test the code with a persistent ChromaDB instance, it works fine and returns the expected answers.
The issue seems to be isolated to the ChromaDB server where my data is stored in the collection.
Any insights or suggestions on how to resolve this would be greatly appreciated. Thank you!
Code I am working with is:
def get_chroma_instance():
chroma_client = chromadb.HttpClient(
host=config("CHROMA_HOST"), port=config("CHROMA_PORT")
)
try:
chroma_client.heartbeat()
except:
raise Exception("Chroma server is not running")
return chroma_client
def retrieve_documents(query, top_data=20):
try:
embedding_function = OpenAIEmbeddings(openai_api_key=open_api_key)
llm = ChatOpenAI(temperature=0, openai_api_key=open_api_key)
document_content_description = ("Search judgements based on metadata")
metadata_field_info=define_metadata_fields()
# vectorstore=Chroma("test", OpenAIEmbeddings())
vectorstore = Chroma(
client=get_chroma_instance(),
collection_name="test",
embedding_function=embedding_function,
)
print("n-----------------------------><--------------------------")
# vectorstore = vectorstore._collection
print("nnnnvectorstore = ", vectorstore._collection)
print("n-----------------------------><--------------------------")
print("nnnnvectorstore = ", vectorstore)
retriever = SelfQueryRetriever.from_llm(
llm, # You might need to replace with your LLM implementation
vectorstore,
document_content_description,
metadata_field_info,
search_kwargs={"k": top_data},
)
# Invoke retriever with the query
docs = retriever.invoke(query)
# Extract document URLs and IDs (modify based on your metadata structure)
complete_docs = []
for doc in docs:
meta = doc.metadata
# Assuming "source" field holds the document URL
document_url = meta.get("source")
if document_url:
complete_docs.append({"source": document_url})
return complete_docs
except Exception as e:
print("---------------------------------------------nError in self query: ", e)