I am working on a project that involves real-time analytics on a large dataset stored in a sharded MongoDB cluster. I need to use the aggregation pipeline to process this data efficiently, but I am experiencing performance issues. Specifically, I want to ensure that my queries run with minimal lag. Here is the pipeline I am using:
const pipeline = [
{ $match: { status: "active" } },
{ $group: { _id: "$category", total: { $sum: "$value" } } },
{ $sort: { total: -1 } },
{ $limit: 10 }
];
db.collection.aggregate(pipeline, { allowDiskUse: true }).toArray((err, result) => {
if (err) throw err;
console.log(result);
});
I have set allowDiskUse to true in the aggregation options, which helps with memory usage, but I am still experiencing delays. I expected the aggregation pipeline to run more efficiently and provide results quickly. I also tried indexing the fields used in the pipeline but with limited success.
I am looking for best practices or optimizations that can help reduce the lag in my queries.
Anastasios3 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.