Context: I’m trying to implement an advanced RAG pipeline that uses Auto-merging Retriever (aka Parent Document Retriever) against specific VectorDB (for example, Pinecone).
It looks like all of LlamaIndex / LangChain tutorials assume the end users uses a generic “index” that can represent any VectorDB but it’s not super clear to me how I can leverage their code sample to use specific VectorDB.
In particular, how can I save https://docs.llamaindex.ai/en/stable/examples/retrievers/recursive_retriever_nodes.html#metadata-references-summaries-generated-questions-referring-to-a-bigger-chunk:
from llama_index import VectorStoreIndex
...
vector_index_chunk = VectorStoreIndex(
all_nodes, service_context=service_context
)
...
from llama_index.retrievers import RecursiveRetriever
...
retriever_metadata = RecursiveRetriever(
"vector",
retriever_dict={"vector": vector_retriever_metadata},
node_dict=all_nodes_dict,
verbose=True,
)
in VectorDB (for example, Pinecone).
While I can see how I could sub optimally save VectorStoreIndex to Pinecone by writing a lot of metadata (even though I suspect there’s a convenient library method for it), I don’t understand at all, how I could leverage these RecursiveRetriever objects with Pinecone client libraries (especially given that my microservice isn’t written in Python.
I tried to search on GitHub but didn’t manage to find anything relevant which was very surprising to me.
1