Following on the example here, one way to create a query of the collection from ChromaDB with filtering by a given type of metadata (i.e. “source_type”) is
results = collection.query(
query_texts=["This is a question or text"],
n_results=5,
where={"source_type": "guideline"}
)
Now suppose there are various values values for the metadata “source_type” in this database beyond just “guideline” and we’re interested in finding one vector from each type — is there a way to construct a query such that each of the n results have a unique type for the metadata field?
I’ve tried checking the documentation for ChromaDB, but this functionality seems to not exist. I can always brute force a search by increasing n_results and simply selecting the top n results that do have a unique type for the metadata “source_type” through a loop, but I thought I would seek a more efficient solution, if one exists.
user18959 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.