I am finding the nrounds that the cross validation for xgboost returns are highly variable. This of course translates to models with varying performance. This is especially a problem when I compare two different models as the comparison is dependent on what nrounds I set each model for based on separate cross validations. Any suggestions to remove the variability? Below is an example of the code I am running and some output:
n_runs <- 10
best_iterations <- numeric(n_runs)
for (i in 1:n_runs) {
set.seed(i) # For reproducibility
cv <- xgb.cv(data = dtrain, nrounds = 1000, nthread = 14, nfold = 4, early_stopping_rounds=20, metrics = "rmse",
max_depth = 3, eta = .1, objective = "reg:tweedie", weight=df$EHY) #
best_iterations[i] <- cv$best_iteration
}
# Print the best iterations for each run
print(best_iterations)
mean(best_iterations)
min(best_iterations)
print(best_iterations)
[1] 214 221 207 241 197 229 210 245 223 159
> mean(best_iterations)
[1] 214.6
> min(best_iterations)
[1] 159