I’m training a model in Pytorch to do image-to-image processing. My data is huge, shape is (64152, 3, 5, 2, 64, 144). I’m using memmapping to save on memory as much as humanly possible.
The issues that I’m facing due to the sheer size of this dataset:
1.) shuffling before splitting into train/test sets
2.) normalizing data
Even when memmapping, these operations cause a memory spike that inevitably exceeds the amount of memory I have available on my computer.
I’ve tried memmapping, normalizing along smaller dimensions at a time. Not sure what to try for shuffling the dataset before splitting into train and test sets.
2