I am writing a browser-based tool to manipulate and visualise data (with D3.js). Currently, I store data in a JSON format, where each table is an object and columns are arrays. eg:
{"data":{"timeString":{"name":"time","type":"time","data":["6/21/2024, 12:00:00 AM","6/21/2024, 12:15:00 AM"], "timeFormat":"M/D/YYYY hh:mm:ss AP","timeMins":["0.0000","0.2500"], "value0":{"name":"value0","type":"value","data":[5,1],"value1":{"name":"value1","type":"value","data":[4,2]}}}}}
Because I want to manipulate the data, I have chosen to make a manipulated set also, eg:
{"data":{"timeString":{"name":"time","type":"time","data":["6/21/2024, 12:00:00 AM","6/21/2024, 12:15:00 AM"], "timeFormat":"M/D/YYYY hh:mm:ss AP","timeMins":["0.0000","0.2500"], "value0":{"name":"value0","type":"value","data":[5,1],"value1":{"name":"value1","type":"value","data":[4,2],"processSteps":[{"add":[5]}],"processedData":[9,7]}}}}}
This is an overhead of memory, but improves performance. Likewise, for each graph, I have a similar pattern of data. This is another memory overhead, but improves performace.
Some of my datasets are now increasing in size to approx 50k rows. I’ve had to modify array functions (like Math.min(…arr)) because maximum call stack size errors crept in.
This has made me reconsider if my approach is the most efficient. I have thought about a single JSON, like:
{"data":{"names":["time","value0","value1"],"types":["time","value","Value"], "timeFormat":["D/M/YYYY hh:mm:ss AP",-1,-1],"data":[{"time":"6/21/2024, 12:00:00 AM","timeMins":0.0000, "value0":5, "value1":4}, {"time":"6/21/2024, 12:15:00 AM","timeMins":0.2500, "value0":1, "value1":2}],"processSteps":{"value0":[{"add":[5]}]},"processedData":[{"value0":9},{"value0":7}]}}
I’ve considered removing the duplicated, processed, data but haven’t yet run into memory issues and the efficiency saving is large, given a visualisation re-render doesn’t need to recalculate the processes.
I’ve thought about also using indexedDB, but that seems to add a layer of complexity for not much apparent gain.
Does anyone have advice on how best to architect the data for memory and performance in this use case?