I’m asking for your help MongoDB experts! First of all, thank you for taking the time to read this.
I encountered the following scenario: I’m doing an aggregation for the MasterCollection collection. I’m “joining” this collection with other 9 collections in the aggregation.
In the end, I’m merging everything into the same MasterCollection. The aggregation execution time took 30 minutes, which is not acceptable. We have a single MongoDb instance (Mongo version 7) with 16GB RAM and we are running it in a docker container.
The MasterCollection has 1015787 documents. The average document size is 1.8kB for the MasterCollection. Additional stats for the collections:
Collection name Number of documents Avg Doc size
collection 1016878 40B
collection2 0 0B
collection3 232 94B
collection4 10289 97B
collection5 10289 97B
collection6 1747 102B
collection 1326 103B
collection8 1016878 42B
collection9 1016878 58B
Compound indexes are created for the fields that are used in the lookups.
My aggregation looks like this:
MasterCollection.aggregate([
{
$project: {
_id: 1,
field1: 1,
field2: 1,
field3: 1,
},
},
{
$lookup: {
from: 'collection1',
localField: '_id',
foreignField: '_id',
as: 'collection1',
},
},
{
$lookup: {
from: 'collection8',
localField: '_id',
foreignField: '_id',
as: 'collection8',
},
},
{
$lookup: {
from: 'collection9',
localField: '_id',
foreignField: '_id',
as: 'collection9',
},
},
{
$lookup: {
from: 'collection2',
let: {
field1Id: '$field1',
field2Id: '$field2',
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ['$_id.field1', '$$field1Id'] },
{ $eq: ['$_id.field2', '$$field2Id'] },
],
},
},
},
{
$project: {
_id: 0,
fieldFromCollection2: 1,
},
},
],
as: 'collection2',
},
},
{
$lookup: {
from: 'colelction3',
let: {
field1Id: '$field1',
field2Id: '$field2',
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ['$_id.field1', '$$field1Id'] },
{ $eq: ['$_id.field2', '$$field2Id'] },
],
},
},
},
{
$project: {
_id: 0,
fieldFromCollection3: 1,
},
},
],
as: 'colelction3',
},
},
{
$lookup: {
from: 'collection4',
let: {
field1Id: '$field1',
field2Id: '$field2',
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ['$_id.field1', '$$field1Id'] },
{ $eq: ['$_id.field2', '$$field2Id'] },
],
},
},
},
{
$project: {
_id: 0,
fieldFromCollection4: 1,
},
},
],
as: 'collection4',
},
},
{
$lookup: {
from: 'collection5',
let: {
field1Id: '$field1',
field2Id: '$field2',
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ['$_id.field1', '$$field1Id'] },
{ $eq: ['$_id.field2', '$$field2Id'] },
],
},
},
},
{
$project: {
_id: 0,
fieldFromCollection5: 1,
},
},
],
as: 'collection5',
},
},
{
$lookup: {
from: 'collection6',
let: {
field1Id: '$field1',
field3Id: '$field3',
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ['$_id.field1', '$$field1Id'] },
{ $eq: ['$_id.field3', '$$field3Id'] },
],
},
},
},
{
$project: {
_id: 0,
fieldFromCollection6: 1,
},
},
],
as: 'collection6',
},
},
{
$lookup: {
from: 'collection7',
let: {
field1Id: '$field1',
field2Id: '$field2',
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ['$_id.field1', '$$field1Id'] },
{ $eq: ['$_id.field2', '$$field2Id'] },
],
},
},
},
{
$project: {
_id: 0,
fieldFromCollection7: 1,
},
},
],
as: 'collection7',
},
},
{
$unwind: // from each collection
},
{
$project: {
_id: 1,
// project from each collection
},
},
{
$merge: {
into: 'MasterCollection',
on: '_id',
whenMatched: 'merge',
whenNotMatched: 'discard',
},
},
], { allowDiskUse: true })
Do you have any suggestions how to improve this aggregation?
I already tried playing with the indexes, I tried to use $facet and to split the whole aggregation into chunks with $skip and $limit; however I had no positive outcome for improving the aggregation.
Boros Gergo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.