Consider a table in which a column represents a monetary value in some cases and a link to the value of another record in other cases so that we have the following:
TABLE 1 Id Product Composite 1 Apple False 2 Orange False 3 Pear True TABLE 2 Product Date Price 1 2010-01-01 0.45 2 2010-01-01 0.50 3 2010-01-01 1.00 1 2010-01-02 0.46 2 2010-01-02 0.49 3 2010-01-02 2.00
The tables above represent the fact that the price for an Apple and Orange are given on each date by the corresponding Price column in Table 2 so that the price of an Apple on 2010-01-01 is 0.45 and the price of an Orange is 0.50 on that same date, however, the price for a Pear on 2010-01-01 is 0.45 and the price for a pear is 0.49 the following day by allowing the Pear’s price to be determined by the price of an Apple on the first day and the price of an Orange on the second day.
Please indicate the pros and cons of this approach?
4
The approach is, frankly, horrible. As I understand it, you want the column ‘Price’ to be interpreted as a foreign key to Product.Id
, and then “redirect” the query for Pear prices to a query for Apple prices or for Orange prices. This requires complicated logic in your SQL code for what should be a straightforward table look-up, and since data base schema don’t have comment fields, it is very easy for someone else to overlook this and simply assume that the price is $1.00.
But quite apart from that: you don’t even save anything – neither time nor storage. Instead of abusing the Price
column as a foreign key, you could just save the actual price in that column at no extra cost! You would end up with a table that has duplicated values, but that is normal and expected in a large data set.
So overall, I can see only cons and no pros here.
2
I think the word “purpose” phrases the question on a level that’s unnecessarily abstract.
On a simpler level, we can say that all the values in a single column should be drawn from the same domain, and have the same semantics. So the same value in the column in two different rows ought to mean the same thing.
In particular, if the values in a column are foreign keys, then they should all be foreign keys to the same table. I’ve seen the opposite done, where references to multiple tables are stored in a single column. The result is invariably a mess. A companion column is always needed to disambiguate, and this leads to one of two problems.
Whenever the disambiguation is done correctly, the code is more complicated than it needs to be, and also runs more slowly than it should. And of course, if the programmer forgets to disambiguate correctly on a join, the wrong stuff gets joined together. I’ve seen both cases.
Foreign keys is the most obvious case, but even in other cases the need to disambiguate is just an awkward consequence of introducing ambiguity in the first place. If you keep different data in different columns, you don’t have to disambiguate.