I’ve just joined a company, and I’m so confused about what I’m working with. Is it a Data Warehouse, a Data Lake, a Data Mart, or an ODS?
Here is my case:
-
We’re working with Telecommunication Data (phone numbers and their call/message/money recharge). We integrate data from many sources like MySQL, Kafka,.. do some processing, and then load them all together into a database on HDFS, managed by Hive.
-
Inside that Hive “database”, there will be many tables like Call (containing records about the phone number and calls from/to that number), Message (same as Call), Charge (phone number and record every time they recharged),… They aren’t related to each other so there is no “Dimension” or “Fact” table.
-
Based on that data, we do a bit of processing and then calculate the credit score for each phone number (we calculate many model variables for each phone number from the “database” on Hive above). This is our final purpose (Machine Learning – AI)
I mean according to the Data Warehouse definition on the internet, DWH is just a place that contains data from many sources, both historical and current data, and that data will be available for analysis, report, data-mining, BI, DA, DS, AI-ML,… purposes. So our Hive “database” satisfies all the said conditions. They’re also “Subject-oriented”, “Integrated”, “Nonvolatile”, and “Time-variant”. The only problem is that our Hive “database” doesn’t have Dimension and Fact tables.
So is our Hive “database” above a Data Warehouse, a Data Lake, a Data Mart, or an ODS? Must Data Warehouse have dimension and fact tables?