How to get 5 equal-sized bins of my data based on a categorical metadata variable?
I have a dataframe with approximately 5000 datapoints and I need to create bins for cross-validation. Additionally, I have a categorical metadata variable with around 1000 unique values. To prevent data leakage, I want to ensure that datapoints sharing the same metadata value are not split across different bins. The bins need to be approximately the same size, and preferably I want to have 5 bins.