I have several enums that need to be defined and shared between collections.
A practical example:
- There are X colors available “Light Blue”, “Red”, “Purple”….
- people can like items of several colors
- items can use several colors
I need to be able to change name of color “Red” to “Cherry Red” and all the associations need to be kept. I also need to expose these via an API as values not as enum ID’s. There are several internal “queries” that work with enum ID’s well (most).
There are number of similar properties (less than 20)
The straight forward approach I can think of:
- have a collection to store {_id, color_name }
- people/items collection have an array of colors that stores ids
- when exposing people / items – an additional aggregation will get the color names and replace them in the array (fairly rare operation)
- when computing “which people may like this item’s colors” I can use ids ( common operation)
Another approach would be to use directly color names, and use a transaction to update the color name everywhere (this is a rare operation but it happens)
I’m concerned about the speed of computing “which people like this item’s colors” when using names directly.
I’m more experienced with RDBMS (probably shows in my first approach).
Questions:
- which of the 2 approaches would be more appropriate for a NoSQL database ?
- is there a speed penalty using strings versus object ids versus numeric ids for indexing ?
- is there another (better) approach that I’m missing ?
Edit: the analogy is imperfect, even if “Red” and “Cherry Red” in reality would be different, my requirements make them the same, think of someone evaluating the color and puts “Red” there and someone else with a better knowledge revises the document and says it’s not “Red” that’s “Cherry Red”. … well not everywhere, but in all users likes and in products that are still on the assembly line so to speak of.
The underlying question is whether your color is a value object (any color with the same value is considered to be the same color) or an entity (each color has an identity that is kept despite value change):
- Value objects are in principle stored directly in a mongodb document. Think of it as you would of a numeric amount: the value is there and immediately usable.
- Entities are generally in their own collection, so that a repository could easily find them by id. They may also be nested within their aggregate roots.
Value vs entity is a semantic question with some other facets:
- If a user claims to like red, it would not be accurate to change the preference to cherry red or blue because someone changed the name of the color.
- What if you suddenly end up with two colors having the same name but different ids? Are they the same colors or not?
- If you want to make your app multilingual, would the users expect to see the color in the original language, or translated in their language.
- Is there a need for more information about the color, such as an RGB value or a pantone color code?
- could there be a need for approximate matches (e.g. like “RED”)?
The fact that you are working with enums suggest that you have a limited number of colors. You could easily load the full collection in memory and do on the fly conversion between colors and ids. Moreover, numeric ids are expected to be more compact and more efficient to compare than strings. But if in the usage patterns there are many more writes than reads, you may not experience any benefit but an overhead. But I believe that performance issues, in either scenario would stay minimal.
One thing is for sure: taking all the past records and rewrite them with a new color name might not be the best idea: documents being dynamic in size and structure, this would require to rewrite a lot of records, not to speak about concurrency issues.
Denormalisation is common in document stores. The fact that newer documents have more attributes that older ones, or that formerly red was used but now it’s cherry red is quite common. Don’t make this dynamic flexibility a major inconvenience just to reproduce RDBMS practices 😉
P.S: documents are exchanged as text. If you send a document to mongodb, it will have to parse the string anyway. Mongodb stores the document in a binary BSON format in which a numeric id is compared quicker than a string. However an object id is 12 bytes (so longer than “red” and “cherry red”). This is why I think that the performance difference either way should be minimal) and the focus should be on the semantics first.
3