I was experimenting with Atlas Search in MongoDB and I found a strange behavior.
Consider a collection of 100000 documents that look like this:
{
_id: "1",
description: "Lorem Ipsum",
creator: "UserA"
}
With an Atlas Search index with this basic definition:
{
mappings: { dynamic: true }
}
For the purpose of the example, the Atlas Search index is the only created index on this collection.
Now here are some aggregations and estimate execution time for each of them :
$search alone ~100ms
[
{
$search: {
wildcard: {
query: "*b*",
path: {
wildcard: "*"
},
allowAnalyzedField: true
}
}
}
]
$search with simple $match that returns nothing ~25 seconds (Keep in mind this is only 100000 documents)
[
{
$search: {
wildcard: {
query: "*b*",
path: {
wildcard: "*"
},
allowAnalyzedField: true
}
}
},
{
$match:{creator:null}
},
{
$limit: 100
}
]
$match alone that returns nothing ~100ms
[
{
$match:{creator:null}
},
{
$limit: 100
}
]
Assuming that all documents match the $search, both those $match need to scan all documents.
I thought maybe it’s because $match is the first stage and Mongo can work directly on the collection, but no, this intentionally unoptimized pipeline works just fine:
$match with $set to force the $match to work directly on the pipeline ~200ms
[
{
$set:
{
creator: {
$concat: ["$creator", "ABC"]
}
}
},
{
$match: {
creator: null
}
},
{
$limit: 100
}
]
I get similar results replacing $match with $sort
I know Atlas Search discourages the use of $match and $sort and offer alternatives, but it seems like performances shouldn’t be that bad. I have a very specific use case that would really appreciate being able to use $match or $sort after a $search and alternatives proposed by Mongo aren’t quite what I need.
What could explain this? is it a lack of optimization from Mongo? Is this a bug?