How to split and store big data reports
We store the device log content in elasticsearch, which is about 150GB of data a month.
The report we need is to summarize various data according to the input time period, and most of the query performance is consumed on aggregating, grouping, and sorting to obtain the latest data
Should I migrate to Amazon AWS or rent HPC clusters to efficiently read and process 54 simulation data files, each with 10 million rows?
I have 54 data files generated by a simulation. Each file has 10 million rows, and each file is several GB in size.