I trained several models for a classification problem and I would like to calculate the Brier Score for the predictions I have.
Nevertheless, I’m not quite sure of what I should pass to the formula from DescTools (i.e., BrierScore(x, pred = NULL, scaled = FALSE, …)).
So far this is the dataframe I created for the probabilities of one of the models:
<code> X0 X1 real_scores max_prob max_source
1 0.024 0.976 1 0.976 1
2 0.910 0.090 0 0.910 0
3 0.524 0.476 0 0.524 0
4 0.942 0.058 0 0.942 0
5 0.944 0.056 0 0.944 0
6 0.074 0.926 1 0.926 1
7 0.254 0.746 0 0.746 1
8 0.864 0.136 0 0.864 0
9 0.522 0.478 0 0.522 0
10 0.422 0.578 1 0.578 1
11 0.772 0.228 0 0.772 0
12 0.564 0.436 0 0.564 0
13 0.968 0.032 0 0.968 0
14 0.760 0.240 0 0.760 0
15 0.978 0.022 0 0.978 0
16 0.818 0.182 1 0.818 0
17 0.730 0.270 0 0.730 0
18 0.824 0.176 0 0.824 0
19 0.962 0.038 0 0.962 0
20 0.514 0.486 0 0.514 0
21 0.360 0.640 0 0.640 1
22 0.708 0.292 0 0.708 0
23 0.940 0.060 0 0.940 0
24 0.916 0.084 0 0.916 0
25 0.606 0.394 1 0.606 0
26 0.838 0.162 0 0.838 0
27 0.742 0.258 0 0.742 0
28 0.850 0.150 1 0.850 0
</code>
<code> X0 X1 real_scores max_prob max_source
1 0.024 0.976 1 0.976 1
2 0.910 0.090 0 0.910 0
3 0.524 0.476 0 0.524 0
4 0.942 0.058 0 0.942 0
5 0.944 0.056 0 0.944 0
6 0.074 0.926 1 0.926 1
7 0.254 0.746 0 0.746 1
8 0.864 0.136 0 0.864 0
9 0.522 0.478 0 0.522 0
10 0.422 0.578 1 0.578 1
11 0.772 0.228 0 0.772 0
12 0.564 0.436 0 0.564 0
13 0.968 0.032 0 0.968 0
14 0.760 0.240 0 0.760 0
15 0.978 0.022 0 0.978 0
16 0.818 0.182 1 0.818 0
17 0.730 0.270 0 0.730 0
18 0.824 0.176 0 0.824 0
19 0.962 0.038 0 0.962 0
20 0.514 0.486 0 0.514 0
21 0.360 0.640 0 0.640 1
22 0.708 0.292 0 0.708 0
23 0.940 0.060 0 0.940 0
24 0.916 0.084 0 0.916 0
25 0.606 0.394 1 0.606 0
26 0.838 0.162 0 0.838 0
27 0.742 0.258 0 0.742 0
28 0.850 0.150 1 0.850 0
</code>
X0 X1 real_scores max_prob max_source
1 0.024 0.976 1 0.976 1
2 0.910 0.090 0 0.910 0
3 0.524 0.476 0 0.524 0
4 0.942 0.058 0 0.942 0
5 0.944 0.056 0 0.944 0
6 0.074 0.926 1 0.926 1
7 0.254 0.746 0 0.746 1
8 0.864 0.136 0 0.864 0
9 0.522 0.478 0 0.522 0
10 0.422 0.578 1 0.578 1
11 0.772 0.228 0 0.772 0
12 0.564 0.436 0 0.564 0
13 0.968 0.032 0 0.968 0
14 0.760 0.240 0 0.760 0
15 0.978 0.022 0 0.978 0
16 0.818 0.182 1 0.818 0
17 0.730 0.270 0 0.730 0
18 0.824 0.176 0 0.824 0
19 0.962 0.038 0 0.962 0
20 0.514 0.486 0 0.514 0
21 0.360 0.640 0 0.640 1
22 0.708 0.292 0 0.708 0
23 0.940 0.060 0 0.940 0
24 0.916 0.084 0 0.916 0
25 0.606 0.394 1 0.606 0
26 0.838 0.162 0 0.838 0
27 0.742 0.258 0 0.742 0
28 0.850 0.150 1 0.850 0
Where:
- X0 = Probability for the negative class
- X1 = Probability for the positive class
- real_scores = 1 (positive class); 0 (negative class)
- max_prob = Max. probability from both classes (X0, X1)
- max_source = from which category that maximum probability comes (X0 or X1).
What I dont understand is if I have to pass to the formula only the probability for X1 (positive class), both, or the max.