I’m working on a ml project. I trained a xgboost model through the caret
library, and I want to calculate the AIC for this model. Since there’s no “straightforward” way to find it ( like AIC(xgb)), I’d like to know how I could do it.
I trained the model with these specifics:
ctrl_xgb <- trainControl(method="cv", number=10, search="grid", summaryFunction = twoClassSummary, classProbs = TRUE)
param_grid_xgb <- expand.grid(nrounds=500, max_depth = c(3, 6, 9),eta = c(0.01, 0.1, 0.3),
gamma = c(0, 0.2, 0.4), subsample = c(0.8, 0.9, 1), colsample_bytree = c(0.8, 0.9, 1),min_child_weight=c(1, 5, 10))
xgb<-train(Target~., data=under, method="xgbTree", metric="ROC", tuneGrid=param_grid_xgb,
trControl=ctrl_xgb,verbosity=0)
and the best paramters are:
nrounds max_depth eta gamma colsample_bytree min_child_weight subsample
500 9 0.01 0.4 0.8 1 0.9
I know there’s this formula for AIC:
AIC=2k−2ln(L)
with k being the number of parameters of the model.
For the log-likelihood I used this code:
ll<- sum(log(ifelse(aicdf$Target == 1, aicdf$xgb_true, 1 - aicdf$xgb_true)))
with aicdf$xgb_true being the prediction of the model on a validation dataset. Correct me if I’m wrong.
If this is a correct procedure to calculate AIC, in my case, what would the number of parameters be equal to?
Thank you!