I am looking for ideas for aggregating prediction outcomes in a way that maximizes the number of classes while minimizing classification error.
As a motivating example, say I’m working on a prediction task to classify songs by genre and there are 6 genres (from the target column below):
Genre (Broad) | Genre (Target) |
---|---|
Pop | Indie pop |
Pop | Hyperpop |
Pop | K-Pop |
Rock | Alt rock |
Rock | Classic rock |
Rock | Hard rock |
The model has 100% accuracy identifying the first 4 categories (indie pop, hyperpop, k-pop, alt rock) but misclassifies around ~50% of hard rock songs as classic rock and ~20% of classic rock songs as hard rock.
Based on this, one could imagine aggregating the target genre in a few ways which would reduce the classification error. E.g.
- 2 classes: Pop, rock
- 4 classes: Indie pop, hyperpop, K-pop, rock
- 5 classes: Indie pop, hyperpop, K-pop, alt rock, other rock
In this case, I’d want to aggregate the target genres to those 5 classes, keeping as many categories as I can while maintaining 100% accuracy.
To find the ideal aggregation structure, I could permute over all possible aggregations and calculate the MSE. However, this would not be computationally feasible given the number of classes I’m working with. So, I was wondering if there is some relevant literature that I can read to better understand how to tackle this problem or if anyone has ideas.
Thank you and sorry if this question is too vague. Happy to edit it to improve it!
mle_in_paris is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.