I have just started using a NoSQL document based database (MongoDB) and i’m curious about the best practices for designing databases.
I presume the architecture should be different from relational databases? Should I still aim for a normalized database?
For example I have a particular use case;
I have a user with a rental history (array of addresses) should that
array be an array on the user or as a separate collection with a
shared key?
3
Appropriate approach for NoSQL database design is a DDD (Domain Driven Design ).
For some people who used to design RDBMS, NoSql looks like Sql anti-patterns and it make more sense when considered in a scope of a DDD.
Depending on usage of addresses, you may define it as a value object inside your rental history model/entity.
Here you are some references that might clear the thoughts on design with NoSQL:
- A Managers Guide to NoSQL
- Wakanda: NoSQL for Model-Driven Web applications – NoSQL matters 2012
- Addressing the NoSQL Criticism
- Our experience with Domain Events
TL;DR
Normalization in RDBMS allows you to leverage the strengths of the relational paradigm.
Denormalization in NoSQL allows you to leverage the strengths of the NoSQL paradigm.
Long answer
RDBMS are great because they let you model unique structured entities (mutable or not) and their relationships with one another. This means it’s very easy to work at the entity level, updating their properties, inserting another one, deleting one, etc. But it’s also great for aggregating them dynamically, a dog with its owner, a dog with the homes it’s resided in, etc. The RDBMS gives you tools to facilitate all this. It’ll join for you, it’ll handle atomic changes across entities for you, etc.
NoSQL databases are great because they let you model semi/unstructured aggregates and dynamic entities. This means it’s very easy to model ever changing entities, entities that don’t all share the same attributes and hierarchical aggregates.
To model for NoSql, you need to think in terms of hierarchy and aggregates instead of entities and relations. So you don’t have person, rental addresses, and a relation between them. You have rental records which aggregate for each person what rental addresses they’ve had.
You need to ask, what data will I need to change together. What data is logically grouping the other data. In your case a person sounds like a good aggregate. What’s the logical entry point towards the rest of the data.
NoSQL let’s you say, store a thing that has other things which have things of their own. Give me the whole hierarchy of things back. Let me change it as I please, now replace the whole hierarchy of thing with my changed one. That’s pretty much all it gives you. Why is it useful? If what you have is a hierarchy of things that you always interact with as a whole. Or if you need to massively scale.
Every thing else RDBMS gives you, you’ll have to manually implement in code and in your schema. You’ll have to join in code if you ever need an aggregate of aggregates. You’ll have to parse if you need only part of an aggregate. You’ll need to check uniqueness yourself if you don’t want duplicate things. You’ll need to implement your own transactional logic when working across aggregates, etc.
So having one big table with everything you need is the way to go in NoSql. Since atomicity is guaranteed at that level only, and performance too. Figuring out your relations early is important. This is what denormalization is.
In RDBMS, denormalization effectively cripples your DB to a NoSQL one. So normally you want the opposite, that is, normalization. If you don’t, you should be using a NoSQL DB instead. Unless you need a bit of both.