I am writing a Python program that imports JSON files into ChromaDB using Langchain with the following code:
chroma_db = Chroma(persist_directory=db_directory, collection_name=collection_name, embedding_function=embedding_function,
collection_metadata={"hnsw:space": "cosine"},
relevance_score_fn=lambda distance: 1.0 - distance / 2)
docs = None
try:
text_splitter = RecursiveJsonSplitter(max_chunk_size=2000)
docs = text_splitter.create_documents(json_object)
except:
docs = None
logging.error("Failed to parse document with JSON - attempting regular text splitter")
state["persistent_logs"].append("Failed to parse document with JSON - attempting regular text splitter")
if docs is None:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
docs = text_splitter.create_documents(json_object)
if doc_ids is None:
doc_ids = [str(uuid.uuid4()) for i in range(1, len(docs) + 1)]
else:
# We look to see if the document exists:
result = chroma_db.get(doc_ids)
if result is not None and len(result) > 0:
# This is an update:
chroma_db.update_documents(doc_ids, docs)
return doc_ids
chroma_db.from_documents(docs, embedding_function, ids=doc_ids)
What I see in the logs is that the RecursiveJsonSplitter almost always fails – despite me manually ensuring that the JSON object is valid – and that the documents get entered using the RecursiveCharacterTextSplitter.
When I then attempt to obtain a similarity score using:
results = chroma_db.similarity_search_with_relevance_scores(query_object, k=1)
I get the following error:
File "/venv/lib/python3.12/site-packages/langchain_community/embeddings/huggingface.py", line 99, in <lambda>
texts = list(map(lambda x: x.replace("n", " "), texts))
^^^^^^^^^
AttributeError: 'dict' object has no attribute 'replace'
I have been looking everywhere for a solution to this problem but, thus far, nothing has helped. I am using sentence-transformers, version 3.0.1.
Can somebody please help?
Ken Tola is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.