I am working on building/researching a datawarehousing solution for my firm. Requirements are:
- We can get close to 2mm records per day (100 columns).
- Users should be able to query/do analytics on at least 1 yr worth of data as quickly as possible. (2mm * 365 records).
- Most of the legacy code is done in Scala so any solution that has good support for Scala is also a plus…
Database/Datawarehouse solutions (AWS based):
- RedShift
- RDS
- Aurora
- Hosting MySQL on an EC2 instance
Datawarehouse solutions (Non-AWS based):
- Snowflake
- BigQuery
Any suggestions?
Thank you for your help.