Ok, I am a newbie and don’t really think “dimensionally” yet, I have most of my initial schema roughed out but I keep flipping back and forth on one table.
I have a Contract table and it has a quantity column (tonnes), and a net price column, which need to be summed up a bunch of different ways, and the contract has lots of foreign keys (producer, commodity, futures month etc.) and dates so it appears to be a fact table. Also the contract is never updated, if that makes a difference.
However, we create cash tickets which we use to pay out part or all of the contract and they have a contract ID on them so then the contract looks like a dimension in the cash ticket’s star schema.
Is this a problem? Any ideas on the process to resolve this, because people don’t seem to like the idea of joining two fact tables.
Should I put producerId and commodityId on the cash ticket? It would seem really weird not to have a contractID on it.
3
Should I put producerId and commodityId on the cash ticket?
If each cash ticket does directly relate to a contract, and producer and commodity are related to the contract, then it does not make sense to move these from the contract to the cash ticket. This is information about the contract so it should stay with the contract.
I work with a similar system and I think this occurs where you have two distinct groups of facts which are related in a one to many relationship. Here, one contract is related to many cash tickets. That means that the contracts can be used to filter the cash tickets and vice versa despite both tables storing facts.
The only problem I see here is that depending on the tech there may be a limit to the number of records in a table on which a dimension can based; we hit the Analysis Services dimension size limit in this way (with a dimension of many millions of members… I don’t know if this changed in SS 2012). So by this I mean, if your tables become very large, you may not be able to use them as dimensions directly.
You can, however, use dimensions related to one fact table to slice a linked fact table. For instance you can apply a dimension filter to the contracts table, and show a count of the connected records in the cash tickets table, etc.
I don’t see a problem with this and we have not come across any unexpected issues; some queries can be a bit resource intensive but this has been manageable we have not found a better way to achieve the same results.
1
You will simplify and amplify your thinking if you forget everything you know about “dimensionality”. The notion of Star Schema doesn’t exist in relational theory. The sooner you drop it, the sooner relations will stand alone clearly in your mind.
Your question, essentially, is: is a contract ID a property of a cash ticket? It sounds like it is. If adding it doesn’t denormalize your table, if it’s functionally dependent on the key, adding it is the right thing to do, full stop. Any consideration of star-ness or fact/dimension duality can be safely discarded.
I have a Contract … it appears to be a fact table.
I think this is what is causing the confusion. As James K. Lowden pointed out, forget relational theory.
Try this:
- Create a ContractDim table with a contract_PK, the original contractID and anything else that relates to the contract EXCEPT the columns that will be summed up
- Create a SalesFact table (or revenue) with all the foreign keys and columns to be summed up like: tonnes and net price. I would pre-calculate some sort of total as well.
This way your cash ticket can link to contractdim (start of a snow-flake schema) or you can have a cash ticket foreign key in your fact table to link directly to the sales values.
You can do the same with CashTicketDim and PayoutFact. This way you would never link the SalesFact to the PayoutFact.