The organisation I work for is very application rich i.e. there are lots of SQL and Oracle databases. We are thinking about a data warehouse.
I believe there is a difference between an Enterprise Resource Planning system and a Data Warehouse. An ERP would involve integrating all the systems together and having one database schema for everything. A Data Warehouse would involve creating a copy database (or copy databases) for analysis reasons. The Data Warehouse could extract information from the original repositories or the ERP system.
Have I understood this correctly?
Are there Data Warehouse products that are available to but or to you have to use Microsoft tools/Oracle tools e.g. SSIS, SSAS etc to write your own?
ERP –> Purchase Ledger, Ordering + Stock Control
Data WareHouse –. Ad Hoc reporting
Or put it another way ERP is generally speaking an on-line system used by the many people in the company. There will usually be a work flow component as in the Foo department orders widgets which need approval from accounts, the order is actually placed by purchasing and the physical installation handled by building services.
A data warehouse is a cumulative copy of various significant data items from other systems in the company. And is used by management to track things like the monthly spend on widgets over the last two years.
1
Expanding on what James said, your live application databases would tend to be build for speed and for strict business rules to ensure data integrity. You might only keep a small amount of the total data in these databases at any particular time (e.g that last 2 days of customer activity), and the focus is on quick response and business integrity.
A data warehouse is generally build for size, with much less focus on speed and/or data integrity. Generally you try and store everything, the idea being that you don’t know what you will need when you are storing it. These systems are not build for speed, as speed on a system that size would be impractical without investing millions in memory and solid state drives. As such they are never accessed by live customer facing applications (because they are so huge they would slow down these applications).
Instead they are build for queries that need access to large amounts of data to get the correct answer. A query on a data warehouse might take from a few seconds to a few days, depending on the complexity. When I worked in a bank we ran a lot of queries for the long term planning guys, to decide things like what customers to target for savings promotion, or to figure out the ideal place to put a new ATM. An example query on a data warehouse would be something like give me the home address of every customer with a current account who has accessed one of these 10 ATMs between 2000 and 2004 more than 3 times a week to withdraw more than $10. That query would be run over literally billions of records (millions of customers, billions of ATM interactions etc), and could take hours to run.
How you build your database and what software you use will depend largely on the type of data you are collecting.
For example if you are collecting logs from your applications, then there are specialist software for large collection of logs, such as Splunk and ElasticSearch
If you collecting a host of different data that doesn’t really all match up in a nice neat fashion something like an Hadoop cluster. Hadoop works on plain text files by splitting the files over a lot of machines and then running small programs on the text files to gather the information you want. Hadoop is open source but there are a lot of companies that have build commercially supported products around Hadoop.
Of course a data warehouse can simply be a large database, such as MySQL, PostgreSQL or Oracle. This might be ideal if you already have all your data formatted into relational tables. In this case you simply copy your data to your data warehouse. MySQL, Oracle and PostgreSQL all have specific features tailored to running the databases as a data warehouse where the emphasis is on speed. A commercial database that specialises in data warehousing is Teradata.
An ERP may be a single product line like SAP, Oracle, (with many sub modules), but it can be separate applications (by manufacturer and technology stack), but are integrated. Data may transfer from one database schema to another. Example would be using SalesForce.com to handle the acquisition of new business but an accounting app is used to manage the billing/post closing of the account.
The data warehouse is build more for reporting (denormalized) and not managing transactions (more normalized). There can also be snapshots for different periods for trending and other point-in-time analysis. Many BI tools work better in these environments.
You may find some ERP systems have a data warehouse component. There are probably data warehouses that are industry specific, but would still need a lot of setup and possible customization. Another benefit is these companies would have inhouse expertise for the technology and industry.
I realize terminology can help with communication, but I would decide on what your needs are and then find solutions. These are very general terms with many meanings. Personally, I’d rather call it Project X than have a bunch of people call it ERP when there is a large data warehouse component.