I’m developing a tool that handles (electrical) parts. The parts can be created, viewed, modified, deleted, grouped and so on…
In order to make this question useful for future visitors I like to keep this question universal since managing parts in a DB is very common no matter what parts are in the DB (CDs, cars, food, students, …).
I am thinking of 3 different DB designs:
-
Using a parts table and derived tables for specialized part attributes.
Parts (id, part_type_id, name) PartTypes (id, name) Wires (id, part_id, lenght, diameter, material) Contacts (id, part_id, description, picture)
-
Using only specialized part tables.
Wires (id, name, lenght, diameter, material) Contacts (id, name, description, picture)
-
Using a Parts-, PartTypes-, ValueTypes- and PartValues table that contain all values.
PartTypes (id, name) ValueTypes (id, part_type_id, name) Parts (id, part_type_id, name) PartValues (part_id, value_type_id, value)
Which one to prefer and why? Or is there a better one?
I am concerned about the DB queries. I don’t want the queries become overly slow or complicated.
Update
The number of types in the DB are pretty much given and static since they rest on a international standard and will be enhanced seldomly.
2
Option 3: (sometimes)
Option 3 is the “EAV” design. In theory it is nice because the fields are taken out of the table structure and become data. But it gives terrible performance. It also disallows the use of proper indexing. And it makes queries much more complicated.
I would only use EAV in special circumstances. I have used EAV to calculate auxiliary parts needed for orders and it worked well. But be very weary of using it as the design for your core tables.
Option 2: (never?)
Option 2 is a no no. What about the shared fields? Are you going to duplicate the table structure for every shared field? It would require you to include unions in reports of the entire system.
Option 1: (winner!)
Option 1 may seem a little too basic but it is probably the best bet for your core tables. All the parts use the same master table for shared fields so it avoids unions in your reports. It has great performance allowing the proper use of indexing. Queries are in the traditional style and are simple.
The downside of option 1 is you can’t add fields dynamically. But do you really want to? By dynamically adding fields you are performing database design at run-time.
3
I would tend to not option #3.
Option #3 is name-value pair setup violating normalization.
Ideally, one attempts to have some level of normalization of the database. Strive for complete normalization and then denormalize as necessary when it is identified for customization or performance issues.
Consider the query “what are the name and part IDs for all wires made of copper”
Structure #1 is
select
name, parts.id
from
wire, parts
where
wire.material = 'copper'
and wire.part_id = parts.id
Structure #2 is
select id, name from wire where material = 'copper'
Structure #3 is
select
parts.name,
parts.id,
from
parts, part_types, part_values, value_types
where
part_types.name = "wire"
and parts.part_type_id = part_types.id
and value_types.name = "material"
and value_types.id = part_values.type_value_id
and part_values.value = "copper"
Consider also the complication of inserts and deletes from the system.
Some further reading on why not #3 — The curse of the name value pair
4
I go option 3
Option 1 is bad because you don’t want your joins to be based off a filed value. ( ie If type ="Wire" join to TblWire
)
Option 2 is bad because you have no way of reporting on your inventory as a whole
12
I would start with a data/object model allowing inheritance, and then use a standard object-relational mapping. This way you get a base class Parts
and sub-classes like Wires
, Contacts
etc. Now, if you apply a “map-each-class-to-own-table” strategy, you get option 1, which is the most “normalized” solution and should be the canonical strategy if you don’t have any more information about the queries you expect.
Option 2 is what you get when applying a “map-each-concrete-class-to-own-table” approach. This can avoid “joins” and may perform better for some kind if queries (especially queries for just one “part type”), on the other hand it makes generic handling with all parts harder and slower. Avoid this if you don’t have any special reasons for it.
Option 3 is what you need only if you want the user to change the number of part types at run time – if you don’t expect that requirement, option 3 will be a perfect example for over-engineering things.
With NOSQL DB database (like MongoDB for example) you will just need one set named “Parts”.
Each part in that set is so called document – record with variable set of fields:
{
"_id": ObjectId("4efa8d2b7d284dea1"),
"partType": "wire",
"length": 102.5,
"diameter": 1.5,
"material": "silver"
},
{
"_id": ObjectId("4efa8d2b7d284sjsq23d"),
"partType": "contact",
"description": "something",
"picture": Binary(...)
},
I think that this is the most natural data storage for the task you describe.
Definitely go with option 1 but with a few very simple modifications:
Parts (id, part_type_id, name)
PartTypes (id, name)
Wires (id, part_id, part_type_id, lenght, diameter, material)
Contacts (id, part_id, part_type_id, description, picture)
You can then use CHECK constraints and DEFAULT values to ensure that the part_type_id is correct, and then you can join on both part_type_id and part_id. This avoids having a conditional join based on only one table, and if you need to add a part_type_id to wires (say we are subdividing that part and adding another table of extended attributes) the default and check constraints can be changed.
2
Option 3 is more generic and can accommodate more use cases.
Going option 3 you may need more joins and complex queries for simple features, in option 2 you’d need complex queries for “big” features like inventory and reports, and may need use unions to accomplish that.
You can always simplify you queries in options 3 using Views, if you very often needs only the Wire or Contact, make a View for each of them. You can optimize it if it becomes necessary.