Data:
I have MongoDB setup to hold information about different devices. Each device can contain multiple nodes which have parent-child relationship. Each device can be applicable to multiple platforms and releases.
Expected functionality:
User should be able to search for a part of the text and view all the devices which have such node names and all platform, releases they are applicable to. User should also be able to filter the result for a few platforms/releases.
Sample data:
- release collection:
{
device_id:"7f96a17585f028ae13bd0d1a390461e9",
device-name:"apts26v",
platform:"bj5000",
release:"10.0"
}
{
device_id:"7f96a17585f028ae13bd0d1a390461e9",
device-name:"apts26v",
platform:"bj5000",
release:"10.5"
}
{
device_id:"7f96a17585f028ae13bd0d1a390461e9",
device-name:"apts26v",
platform:"rex9000",
release:"10.5"
}
- node collection:
{
device_id:"7f96a17585f028ae13bd0d1a390461e9",
text:"tree-node",
xpath:"/module/active/tree-node"
}
Index:
- I have an index in release collection for device id, platform and release
- I have a text index in node collection for text
Amount of documents:
The database holds large amount of data and the quantity is expected to increase further.
- release collection has around 400k documents
- node collection has around 35 million documents
Approach 1:
Getting all the results for search string provides timeout error. So I decided to perform server side pagination and get only 50 rows at a time. If I simply perform a search on node collection and limit to 50 rows, the query is working fine. But when I perform a lookup with release collection, remove the rows which do not have selected platform and release, and then limit by 50 rows, the query provides timeout error.
Approach 2:
If I perform the filtering of platform and release after the limit stage, results are returned but with different number of documents in each page. If in the first 50 query results, if only 10 are applicable to the selected platform and release, the first page will have only 10 entries.
Aggregation:
node.aggregate([
{
"$match": {
"text":{
"$regex":search_text,
"$options":"i"
}
}
},
{ "$sort": { "device_id": 1 ,"text":1} },
{
"$lookup": {
"from": "release",
"let": { "deviceId": "$device_id" },
"pipeline": [
{
"$match": {
"platform": "rex9000",
"release":"10.5",
"$expr": { "$eq": [ "$$deviceId", "$device_id" ] }
}
}
],
"as": "releasePlatformInfo"
}
},
{
"$match": {
"releasePlatformInfo": {"$ne": []}
}
},
{"$skip": 0},
{"$limit": 50}
])
How should I sort this out? Should I can resolve this by changing the aggregation query or restructuring the data? Please give me your suggestions.
2