I’m currently dealing with performance issues in a Django application that’s structured as follows:
- User – Social Network – Post – Tag/Label
Each entity is represented by its own table. The number of tags is growing exponentially, even after filtering out less significant ones (tags are assigned by an AI, not humans). Here’s what I’m considering to optimize performance:
-
Separate Tables for Insignificant Tags: Move less significant tags to a different table or even a different database. For instance, having 100k rows in
MediaTags
table and 10M rows inSmallMediaTags
. Would this improve the performance of the main tables? -
Denormalization: Currently, each post is linked to 2-5 primary tags, meaning the number of tags equals 2-5 times the number of posts, with occasional outliers having 6+ tags. This puts a massive load on
MediaTags
, complicates searches, and incurs additional costs for convertingMediaTags
into objects. However, if we store top-N tags directly with the posts (e.g., top-10 tags), it could result in nearly a 1:1 ratio. This way, all required tags would be part of the posts themselves, eliminating the need for additional table queries.
We have recently started displaying lists of posts with specific tags, which has become a pain point. The current process involves:
- Finding posts
- Filtering them to keep only those with the required tags
- Aggregating them (would aggregating the max value be faster?)
- Sorting them
- Applying pagination
This is an incredibly slow operation.
Any advice on how to optimize our system would be greatly appreciated.
4