I am trying to design and build Data Reporting Platform.
Data sources
- 7 PostgreSql databases with different schemes and data.
- Mixpanel, accessed via API
We are trying to migrate all data to Redshift, and then use Lambda to query this data, and provide it for external stakeholders, but all our databases are built with tenancy paradigms, that are built using columns of array type for every row in the database.
Redshift does not native support array type of column, and it requires some transformation between PostgreSQL and Redshift.
We would like to have a solution that will be easy to support, and I feel that a custom ETL pipeline that requires all-time support, will not suit our needs.
Main requirements:
- We need to have the possibility to query data from all PostgreSQL databases simultaneously
- We need to provide API for external stakeholders, to query this data
I have tried to analyze different approaches, but my knowledge really limited in this field. So I would like to use the easiest possible approach, but an approach that still has possibilities to be extended.
I can use any of AWS services or any open-source software.
DKh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.