I have a collection in MongoDB that has ~2Million Documents. Average document size is 28KB. There are several fields but we are concerned with three fields namely – “time”, “test_time” and “arr”. “time” is a string field and stores the date and time in string format, “test_time” is a Date field and stores the exact same date and time in Date format. “arr” is an array of objects with ~10 fields in each object. Average length of an “arr” is ~5. We are concerned with only one of those fields called “name” , so basically arr.name
The pipeline is as follows –
a match stage on the “time” field with greater than and less than some particular dates X and Y. After that a projection field of projecting only {arr.name:1, _id:0}
In another scenario I run the same pipeline but instead of “time” we use “test_time”. Same projection as above.
Q1. I do not see a change in the speed when searching for ranged queries on a STRING field VS a DATE field. I would believe the Date type searching in MongoDB would be the faster one. Could you help me investigate why could this be the case?
Q2. I have the following indexes – time_1, test_time_1, time_1_arr.name_1 and test_time_1_arr.name_1
By default mongodb uses the time_1 or the test_time_1 index and not the compound indexes with arr.name. I understand it is not a covered query because in my match condition there is only a match condition on the time field and not the name field. I am not sure how to make it a covered query so that it can run much faster.
Currently the docsExamined = keysExamined = nReturned which means it is doing disk reads to examine the docs. Is there some way I could optimise this?
To fetch ~35000 documents it takes ~6 seconds which seems a lot given that I am projecting only one field of arr.name and one document size is ~28KB. I also realise that an aggregation pipeline is assigned only 100MB of RAM by default so how can I utilise the RAM of my machine to be fully optimised.
Anirudh Agarwal is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
3
I do not see a change in the speed when searching for ranged queries on a STRING field VS a DATE field.
MongoDB stores data in BSON format. According to the BSON Spec a string be (e_name is the null-terminated field name):
signed_byte(2) e_name string
and Date would be:
signed_byte(9) e_name int64
The index will store the data without the fieldname. Assuming an ISO8601 format for the string like “2020-07-10 15:00:00.000”, the difference would be comparing so a greater than/less than comparison using the index would be comparing 24 bytes per document vs comparing 8 bytes per document. However, the comparison can stop early once a difference is found, so it is unlikely to need to compare the full number of bytes for each.
In the context of querying using an index, this would mean that it seeks within the index to find the first match, then iterates the index until a non-matching document is found. Since the query is not covered, fetching the 28KB document will dwarf the time needed for the comparison, even if the document is already in cache.
I understand it is not a covered query because in my match condition there is only a match condition on the time field and not the name field. I am not sure how to make it a covered query so that it can run much faster.
The query is not covered because it cannot be fulfilled by the index alone.
When MongoDB indexes an array, a separate index key is generated for each array element. This means that your documents with a 5-element array will have 5 index keys, with each index key containing only one of the array elements.
The projection arr.name:1
means it should return an array containing each of the documents in the array containing only the name
field. Since that cannot be directly established from a single index entry, it fetches the document so it can evaluate the array as a whole.
In theory, it might be possible to accumulate the index keys and combine the keys with the same id, but the MongoDB query executor does not do that.
3