I am researching probability calibration and have some questions about properly evaluating calibration if I have a highly unbalanced dataset, e.g., a Fraud detection dataset from Kaggle.
I know that there are many calibration techniques such as Platt Scaling, Isotonic Regression, Beta Calibration, SplineCalib, and Venn-ABERS. Also, there are multiple ways to measure how well calibration metrics such as ECE, Brier Score, Log loss, and calibration curve work.
Also, I want to test it on boosting algorithms: XGBoost, LGBM
So,
- What calibration model should I use for an unbalanced dataset?
- What metric is more suitable for such a task?
- Could you provide some resources (books, YouTube videos, articles) to learn more about different approaches and when I should use them?
I tried Platt Scaling and Isotonic Regression on the XGBoost model and compared ECE, Brier Score and log-loss metrics before calibration and after, and they got worse values.