I would like to be able to add and remove documents from chromadb using langchain without creating a new vectorstore every time. I understand that you can do this by referencing document ids, but how do you do this when splitting documents into chunks? Do you need ids for each individual chunk?
Here is my code:
def process_documents(self, docs):
print("Splitting and embedding documents...")
chunks = self.text_splitter.split_documents(docs)
if os.path.exists(self.chroma_path):
shutil.rmtree(self.chroma_path)
print("Deleting current chroma path:" + self.chroma_path)
example_db = Chroma.from_documents(chunks, self.embeddings, persist_directory=self.chroma_path, ids=self.ids)
I know that you can remove documents from a collection like so:
example_db._collection.delete(ids=[ids[-1]]) but how would you do this with chunks, as seen above?
Cody Kletter is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.