I have a collection where I put metadata of the documents uploaded by users. [fields saved = user_id, doc_id, time, filename, etc.]
If a user has uploaded 175K documents, and I need to fetch the data (175K entries, but I only need doc_ids), its taking 20-30 seconds to fetch all the documents with only doc_id as its projection.
all_doc_ids = list(
doc_metadata.find(
{"user_id": user_id},
projection={"_id": 0, "doc_id": 1}
)
)
How can I improve the speed of the query, considering I need all the doc_id of the documents.
PS: org_id is a part of compound_index.
I have tried fetching the documents in chunks (python) by running it on multiple simultaneous threads, it works but throttling is seen on latter threads, taking almost 10 seconds for fetching even 10000 documents.
Salina Koirala is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.