I have for a long time been working on an ODS as well as Data Warehouse. Both are integrating a wide variety of data sources from stove pipe applications. One of the uses of the ODS is to provide data to other stove pipe applications.
Imagine one app maintains a database of personnel, and another app manages tracking sales. Occasionally the Sales app may need to have a drop down of personnel that someone can pick from, say to credit a particular employee with a sale/commission.
The Sales application can query the ODS to get the list of personnel. This allows the Personnel app to change its data structure, and the ODS modifies the ETL process to adapt to that change. Thus all the other apps consuming that data will not be impacted by the change.
The Sales app will need to save a PersonnelID to that sale/commission record in its own database. However, the next time the ODS is refreshed, if it is using a full load technique, the key will change. Since the PersonnelID stored in the Sales database is a separate database, there is no straight forward way to cascade that change.
This creates a challenge where any changes made to the ODS have to be made very carefully, and may even limit certain designs because external applications are depending on those keys to never change. I would usually avoid exposing keys to users, but in this case it seems necessary to allow external apps to reference enterprise wide entities in their own applications.
The same goes for lookup tables that are available in the ODS, where a lookup table has keys and text.
After a full load of the ODS I can ensure the keys satisfy referential integrity within the database, but not with external databases utilizing those keys. Since there are some parts of the ODS currently coded as a full load, which would cause keys to be regenerated, I would need to recode that ETL to be incremental, so that external databases can reference those keys without fear of them changing.
What techniques are used when you have an enterprise wide data source, and other applications consume that data and need to store foreign keys referencing entities in that data? How do you decouple the foreign key references as much as possible without complicating the access to that data?
Currently I am using table valued functions to provide access to data. I chose this approach because it allows parameters, joins, and decouples access to underlying tables that may change later.
11
You have to adhere to these rules:
-
OLTP applications don’t change keys.
-
ODS generates its own Business Intelligence keys.
-
Datawarehouse database never references OLTP keys and must use keys generated by ODS (step 2).
There is no way to go lean about any of the above rules unless you do a full load every night.
Trying to get “deltas” (changed data only) from OLTP is usually a nightmare in large enterprise systems like Siebel, SAP, etc. unless your ETL provides “connectors” (canned queries against ERP and similar tools) and you have expertise in the source OLTP system. Even then, it is usually difficult because of the complexity of the such enterprise schemas.
Again, you have to adhere to the above rule and most probably, there is no other way. Kimball, has a good set of books on the subject if you care to dig for more.
5