If I have a data frame df
, which has five columns: ‘A’, ‘B’, ‘C’, ‘D’, and ‘E’, which contains python strings. Currently, ‘B’, ‘C’, ‘D’, and ‘E’ has unbalanced unique values (i.e., some unique values have more rows than the others). How can I sample df
so that column ‘B’, ‘C’, ‘D’, and ‘E’ have balanced number of unique values (i.e., each unique value in a specific column has the same number of rows)? I want to sample with replacement so that the resulting data frame has the same length as the original data frame, though some rows may be duplicated and some may be omitted. Thanks!
2