How to Optimize Memory Usage for Cross-Validation of Large Datasets
I have a very large DF (~200GB) of features that I want to perform cross validation on a random forest model with these features.
The features are from a huggingface model in the form of a .arrow file.