I have an analysis framework which operates on large datasets loaded from other workbooks. The sets comprise headered columns each containing a single metric, with multiple columns on each sheet and multiple sheets containing a dataset each.
I’d like some broad advice re what approach is best when loading, processing and referencing large sets.
I’d like to be able to dynamically load an external sheet, extract column headers, create arrays based on those headers and fill the array.
Currently, i’m loading each metric column into an array for processing and referencing, however I’m having to explicitly declare each array and naming it based on the column header.
I’d like some broad advice re what approach is best when dealing with large sets.
- Is loading data into arrays even required – as opposed to leaving the data in a worksheet, and using for example a dictionary to extract headers and referencing data using worksheet.cells and the dictionary header reference. I imagine using arrays is faster?
- At what point does memory use become an issue? Does this differ depending on whether data is left in sheets and referenced by cell or loaded into an array. I currently have approx 160K cells of data, but if this is 1.6M? What about 16M?
- chatGPT recommends loading arrays using a dictionary for the array and a sub-dictionary for the metric / header – is this a decent approach?
- any other approach that can A) extract data from an external source B) make it easily referencable C) do so dynamically
Apologies for the broad scope of the question – I’m able to get the thing working I’m just after advice re how best to approach the issue and am having a hard time finding advice specific to my issue online.
Thanks in advance for any advice.
MASH is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.