I’m trying to implement a multi label classification model. The dataset is very small, I only have 900 rows and the dataset is not about images. It’s like a tag classification. I have 12 columns representing my y that can take values 0 and 1 at the same time. The problem is that I have a frequency of value 1 less than 4% for some columns.
So, I was looking for a methodology to do oversampling and I came across a paper describing the MLSMOTE technique but it doesn’t work very well with my data. Have any of you already faced a similar problem? I’m using some scikit multilearn libraries but there isn’t any that work for oversampling. Additionally, the imblearn library does not address multi-label classification problems.
Thanks to whoever will answer me.