I need to make a query with aggregations in MongoDB. I’ve already built 2 different aggregation pipelines that give me the result I need, so I have to decide which one to stick with.
I was reading the documentation and I saw that MongoDB has a built-in profiler for this purpose. So what I’ve done is run the aggregations and review the profiler results getting something like this:
{
op: 'command',
ns: 'auth.users',
command: {
explain: {
aggregate: 'users',
pipeline: [
{
'$match': {
'accessControl.organization': ObjectId('6638e821b35e766556ebc5de')
}
},
{
'$addFields': {
accessControl: {
'$arrayElemAt': [
{
'$filter': {
input: '$accessControl',
as: 'item',
cond: {
'$eq': [
'$$item.organization',
ObjectId('6638e821b35e766556ebc5de')
]
}
}
},
0
]
}
}
},
{
'$group': {
_id: {
role: '$accessControl.role.name',
id: '$accessControl.role.id'
},
users: {
'$push': {
_id: '$_id',
disabled: '$disabled',
email: '$email',
emailVerified: '$emailVerified',
name: '$name',
accessControl: [
'$accessControl'
]
}
}
}
},
{
'$project': {
_id: 0,
role: '$_id',
users: 1
}
}
],
cursor: {}
},
verbosity: 'queryPlanner',
lsid: {
id: UUID('c35ecb9e-abc5-4286-ac78-5d74ea6d9bf4')
},
'$db': 'auth'
},
numYield: 0,
queryHash: '7A34A1A0',
planCacheKey: '7A34A1A0',
queryFramework: 'classic',
locks: {
FeatureCompatibilityVersion: {
acquireCount: {
r: 2
}
},
Global: {
acquireCount: {
r: 2
}
}
},
flowControl: {},
responseLength: 2437,
protocol: 'op_msg',
cpuNanos: 799390,
millis: 0,
planSummary: 'COLLSCAN',
planningTimeMicros: 200,
ts: 2024-05-14T21:22:23.875Z,
client: '172.18.0.1',
appName: 'MongoDB Compass',
allUsers: [
{
user: 'abc123',
db: 'admin'
}
],
user: 'abc123@admin'
}
I’m not really sure how to interpret this to decide which one to use. I understand that millis determines the number of milliseconds it takes to execute the query, but in both cases I get 0 (which causes me a bit of confusion). Maybe I need to add more users since I’m trying some randomly generated fake documents. So I decided to be guided by cpuNanos, which as I understand, is the amount of cpu used in nanoseconds. But in both cases the values range between 300,000 and 1,000,000 (sometimes higher values in the first pipeline and sometimes higher values in the second). So what should I do to decide? Should I run them many times, average, and see which one is greater? Or what other field can help me decide.