I see a lot of talk in the OOP world about principles and laws such as Open/Close and Loose Coupling, I can understand how they are so high valued, However I seem to have ran into a problem with applying these principles and laws once I start to include relational databases.
For example purposes, if I have a application which we are able to separate well into multiple components(Open/Close design) and allow them to work with each other when if needed, I start to run into problems when trying to enforce referential integrity.
If I enforce a relational setup, and enforce integrity through foreign keys and so on, the tables become dependent on each other, gradually my OOP application logic starts to fall apart and become one big system.
I can of course remove all references to tables, but then I risk making data redundant and start to repeat code/data, which then goes against the Don’t Repeat Yourself principle.
Have you came across such problems before? If so how did you handle it? Perhaps there is a way to achieve both referential integrity and OOP open/close components.
Thanks
1
My suggestion:
You have to enforce referential integrity with FK. Let the RDBMS do that work and let the classes throw exceptions.
Tables being related has nothing to do with coupling of classes or SOLID principles. That’s classes’s stuff.
Tables are orthogonal to classes.
Don’t overengineer things.
First of all: You are right, this is a real problem. As so often design issues surface when you try to test stuff. While testing is most of the time fairly easy if you have some design skills and OOP based code, things get tricky when the database gets involved.
One symptom is that if you want to have an entity A (i.e. an object persisted in a database) you need the full tree of dependencies and don’t have an easy way to mock those.
The reason for this kind of problem is that there is no separation between runtime and compile time in databases, thus a dependency you want to have/use in runtime is there all the time.
A theoretical solution is to create a database system that actually has something like interfaces. But that is probably out of scope for most of us.
The more realistic solution is to separate the database very strictly from the rest of the code, thus isolating the big convoluted blob that is commonly referred to as database schema from the nicely structured application.
The first step is to have Repositories that return your entities and leak no information what so ever where and how these entities get persisted. A good way to ensure this is to have two implementations of those: one actually using your database and one using for example simple HashMaps for storage. The later are also very useful for testing. Be wary of any frameworks bleeding into your domain code. Hibernate for example has the habit of doing that by means of lazy loading and similar mechanisms.
The second step is to identify aggregates in the sense of Domain Driven Design. I.e. bags of entities that get controlled by one entity and its repository, the entity root. For example LineItems of Orders should probably only loaded and updated via an Order, never on its own. But entities that belong to a different aggregate (for example customers) should get loaded via their own repository and not via some kind of magic lazy loading on navigation. This way you can store different entities in different datastores, your application doesn’t care about that. If they happen to be in the same database (as they probably do) they can have foreign keys, no problem.
There might be one technical challange with this approach caused by constraints: Normaly constraints get enforced on every single DML statement. If that causes you issues you might consider deferred constraints which only get enforced on commit. This might actually benefit performance, but also might make debugging more difficult, because now you might get foreign key violations on commit, long after your DML statements happened.
I think that OOP and RDBMS are not good friends. OOP seems to me more apt to deal with “rows” in each table. RDBMS deals with “sets of rows”.
Nevertheless, there are lots of frameworks that try to overcome these problems, but they lose the “purity” of OOP.