I am creating a design for different Geographical types where each type is just a little “place holder” in over all heirarchy.
One example is
Zip code Belongs to Territory Belongs to District
Other would be
Zip code belongs to State belongs to Some Region
We have two schools of thought one is that:
1 create different tables like Zipcode, Territory, Regions etc
- other is that just create one table which is called Geographical
Entity and then everything is just a type so you can assign a Type Id
to each geographical entity and self reference the table
Now, if I go by design #1 then I don’t have to worry about any non traditional decision. I can create one class in OO space and then map to each table. Also, imposing business level rules such as Territory can do such and such things but Distrtict can not. And also things like a Territory can belong to District but District can not belong to Territory are all contained in design itself.
However, If I go to Design #2. It is extensible such that if we have a new Geographical type called “XYZ” then we do not need to add a new table for that. We can simply add a new type into Geography type and it would be addressed.
I find Design #2 extensible but unnecessarily complex. Am I right thinking it that way? Or that’s the way to go?
5
There are at least two aspects of database design that your question touches on.
The first is the issue of class/subclass design. What is the best design for classes and subclasses when designing a relational database? (This issue is also known as types and subtypes or generalization and specialization).
The second is the issue of representing a hierarchy in relational data. What is the best design to make insertions quick and easy? What is the best design to make queries quick and easy, especially searching the subtree.
These two sound like the same issue, but they are not. There are situations where subclasses arise, but there is no hierarchy (except for the inheritance hierarchy). There are other situations that arise where there is a hierarchy, but all the participants are of one class, as far as the designer is concerned.
With regard to classes and subclasses, I recommend you look into Martin Fowler’s treatment of “table inheritance”, particularly “single table inheritance” and “class table inheritance”. There are benefits and drawbacks to each choice.
With regard to representing a hierarchy, I recommend you look into “adjacency list model” and “nested set model”. Adjacency list is so simple that most newbies learn the technique without ever learning its name. Nested sets is less well known, and is only useful under some special circumstances. It does, however, permit traversing a subtree without using recursion.
I’m not going to go into more detail here, because it’s not clear whether you are interested in both of these issues or just one of them.
1
Yes is can be, but beware of the following:
- Recursive queries (For example, in Sql Server, common table expression (CTE)
- Recursion in code (self referencing objects)
Both of these can be tricky to understand, implement, and support.
For your queries, it will not be a simple
Select * from Table
type of query. Also, most likely the object model in code will have self referencing objects that will require a root and tree/node structure to support.
However, I can say this model is appropriate. In a past project we implemented a self referencing geographic model to good success. As a precaution when we interviewed candidates we had a recursion code question to ensure they could handle the code and model as not everyone is familiar with this type of data structure.
I once worked on a system that modeled similar data. Not geographical, but hierarchical, where the hierarchy was not necessarily well-defined and entities might “skip” layers: A -> B -> C was just as valid as A -> C.
Using separate tables is great when you can guarantee the data will match the structure, and in fact must match the structure. However, that appears not to be the case here.
Things to keep in mind:
- Querying an entire hierarchy from leaf to root cannot be easily done. SQL Server has CTEs and Oracle has hierarchical queries, but the data is not in a consistent state the way an ORM layer would expect: you have arbitrarily many records of arbitrary types not defined by the table itself but by one of its fields. In other words, the type of a record is defined by the record itself, not the table in which it resides.
- Other areas of an application require extra care. It is not as simple as grabbing the “state” or “postal code” field and linking it to a UI element, you have to look at a code/value pair to determine what the data represents. Extra logic is required to put data from the database in the proper place for your model and view.
- Joins become almost impossible to do correctly in all cases which would not be true for the strongly-typed separate table solution.
Given the information in your question I would do it this way, but keep in mind it introduces complexity that must be considered and managed.
2