Let’s say I want to classify three classes (A, B, C). This could be divided into three binary classification datasets as follows:
Binary Classification 1: A vs [B, C]
Binary Classification 2: B vs [A, C]
Binary Classification 3: C vs [A, B]
As you know, in One-Vs-Rest approach, class index with the largest score is used to predict a class. When new sample data is input, if these three binary classifiers output the following probabilities,
Probability of A: 70%
Probability of B: 80%
Probability of C: 75%
then, the final predicted class will be B.
Let us assume that the training performance (through cross validation) score of these three binary classifications is as follows:
A vs [B, C] : 70%
B vs [A, C] : 60%
C vs [A, B] : 65%
I try to consider(include) training performance score when choosing the final class. In my opinion, considering(including) training performance is achieved by multiplying the probability of the binary classifier by the training performance score. That is,
A vs [B, C] : 70% * 70% = 49%
B vs [A, C] : 60% * 80% = 48%
C vs [A, B] : 65% * 75% = 48.75%
then, the final predicted class will be A.
What do you think of considering(including) training performance scores in this way when selecting a final class?
I searched a lot on Google about One-Vs-Rest for Multi-Class Classification considering training performance scores, but I couldn’t find a clear answer.