Are there limitations to how much mongodb can scale when it comes to querying millions of documents.
I’m designing a social site, similar to twitter where users will have a chronological feed of posts. When first testing for scale my queries took upwards of 30s for 1M documents, which was way too long for a social feed. I ended up denormalizing the posts schema and dropped the query time to about 1s for 1M records. But when attempting the same query on 10M documents it now takes upwards of 30s again.
Here is an example of a query that I would be running:
let matchConditions: any = {
createdAt: { $gte: threeDaysAgo },
};
if (user) {
matchConditions._id = { $nin: user.blockedPosts };
matchConditions.author = { $nin: user.blockedUsers };
}
....
const popularPosts = await Post.aggregate([
{ $match: matchConditions },
{
$addFields: {
popularity: {
$add: [
{ $multiply: [{ $sum: "$likes" }, 1.1] },
{ $multiply: [{ $sum: "$comments" }, 1.2] },
],
},
},
},
{
$sort: { popularity: -1 },
},
{
$skip: (page - 1) * limit,
},
{
$limit: 30,
},
]);
The strategy I am currently implementing is finding all of the posts running my popularity algorithm and limiting to the first 30. I’ve created an index on createdAt
and a combined index for likes
and comments
but that hasn’t done anything for performance. Even a simple Post.find().limit(30)
takes over 25s at 10M records.
Am I reaching the limitations of query/index optimization? I’ve read that twitter has over 500M tweets per day. What strategies could be implemented to query that many documents so quickly?
Is this a situation where sharding would help?
Any resources, references or documents would be very helpful.
1