I developing an app with tree(folder-file) structure, on which I should perform full-text searches with MongoDB. I did a research on the best tree structure practices and found this great article, but I still can not decide which DB structure will fit my needs.
I have the following requirements in my mind:
- I should be able to perform full-text search on individual folders, as well as everything from specific users
- The folders/files should be shareable, so I need to be able to perform full-text search on all items accessible by specific user
I’ve been thinking about the following structures.
Structure 1
Fields of Users collection
1. _id - objectid
2. name - string
Fields of Folders collection
1. _id - objectid
2. name - string
3. owner - objectid
4. sharedWith - array of objectIds
5. location - objectid of parent folder, null if in root
6. createDate - datetime
Fields of File collection
1. _id - objectid
2. name - string
3. owner - objectid
4. sharedWith - array of objectIds
5. data - string
6. location - objectId of folder
7. createDate - datetime
So here comes my questions:
- Should I use model tree structures with Parent References or Child References?
- Should I use 1 collection for both files and folders(with type field) or I should separate them.
- Does it worth to have only folder collection and nest documents in it.
This were my most important questions, thought I will greatly appreciate any advice on how I can improve the structure.
Some of the answers depend on how you foresee the system being used. Without knowing more about your specific requirements, my answer is aimed at a generally flexible system that could work OK with a wide variety of use cases, and not assuming any “shortcuts” (like, absolute limit on number of folders etc). More specifically:
- Should I use model tree structures with Parent References or Child References?
If you use parent references, then no matter how many documents a folder might have, the size of the object representing that folder will stay constant. If you use child references, you’ll need to update the folder document object every time a file is created – this might introduce synchronization issues (2 files being added to the same folder at the same time), or document size issues (imagine a folder with a million files in it). However, having such a “normalized” structure will make it more expensive to do things like “find all folder/files nested under this root folder” without additional optimizations.
- Should I use 1 collection for both files and folders(with type field) or I should separate them.
File systems typically represent both files and folders as “nodes” that then carry additional data/type information. Splitting them into separate collections only makes sense if you have some very specialized operations that you need to run on those data sets (can’t think of anything off the top of my head), and having separate collections might help.
- Does it worth to have only folder collection and nest documents in it.
You will lose the ability to access individual files without loading everything else that’s in that folder. Plus, this will be problematic if the number of files per folder grows and your folder objects start getting very big. Separate documents that represent separate “nodes” of your file system is probably the way to go.
Tradeoff: if you know that you’ll have a rigid folder structure with a handful of folders and not too many documents, a nested structure could be convenient.
When answering questions like these, it’s very helpful to either know all of your requirements upfront, OR if the requirements are vague then develop a generally flexible system that’s easy(er) to change/maintain once requirements are understood better.
I generally find it useful to ask extreme questions, such as “what happens if I have a billion folders with billion files each?”, or “what if I have a structure that’s nested a billion folders deep?”. Questions like that tend to illuminate the problem in helpful ways.