I have a large set of data, 4000 rows+. each row is associated with metric data that is collected daily.
for example:
Item table:
Id | Item |
---|---|
1 | Hello |
2 | World |
… | … |
Metric table:
Id | ItemId | CreatedDate | Metric 1 | Metric 2 |
---|---|---|---|---|
1 | 1 | 2024-04-20 | 34 | 21 |
2 | 1 | 2024-04-21 | 54 | 12 |
3 | 1 | 2024-04-22 | 32 | 23 |
4 | 2 | 2024-04-20 | 53 | 43 |
5 | 2 | 2024-04-21 | 54 | 23 |
6 | 2 | 2024-04-22 | 12 | 45 |
… | … | … | … | … |
I need to display the item table to users with an additional column that is Rank and the rank would be based off of the average of Metric 1 data, whats compounding the problem is that users can order by columns and set a date range.
To calculate the rank, i have done the following:
- Get all items and their metric data within a date range
- Calculate the average Metric 1 value for each item
- Order by the average, the index would give us a rank to associate with an item
- If the user orders the view table by average of Metric 2, the rank of items wouldn’t change
Whilst this works as expected, it is computationally very expensive and takes a long time as it has to iterate over 4000 rows + a set of metrics for each, to further add to this there is an indicator to say whether its gone up or down in rank by comparing the current date range(e.g. 7 day range) with its previous date range(e.g. previous 7 days) this effectively doubles the load.
The data is on a SQL server.
Is there anyway to either make this more performant/faster for the user?
The main problem for me is that there are too many changeable variables to be able to cache/store values.
Being able to filter by a daterange, means i cant store the data and calculations have to be done inline.
What i have works and meets requirements however it is very slow, 40 seconds give or take.
For more common requests such as page load, there is cache that is prebuilt.
Which works but again, should the user change the orderby value or daterange it would have to calculate everything.
Are there any technologies available that speed this kind of process up?
Are there any methods in which data can be stored differently to speed the calculation process up?