I am exploring the possibility of simultaneously performing feature selection and hyperparameter tuning using multiple performance measures. I am building ecological niche models, and it is advised to use multiple performance measures rather than just one. I came across this post, but it only uses a single performance measure. One of the comments mentioned: “For each specific feature subset, tune the hyperparameters of the learner in the inner resampling loop, and after you tune them, evaluate them in the middle resampling loop to select the optimal feature subset.”
In the mlr3
book, an example is given for multi-objective tuning:
instance = ti(task = tsk("sonar"),
learner = lrn("classif.rpart", cp = to_tune(1e-04, 1e-1), minsplit = to_tune(2, 64), maxdepth = to_tune(1, 30)),
resampling = rsmp("cv", folds = 3),
measures = msrs(c("classif.tpr", "classif.tnr")),
terminator = trm("evals", n_evals = 30),
store_models = TRUE)
tuner = tnr("random_search")
tuner$optimize(instance)
The same example for conducting feature selection based on random search with multiple performance measures would be:
instance = mlr3fselect::fsi(task = tsk("sonar"),
learner = lrn("classif.rpart"),
resampling = rsmp("cv", folds = 3),
measure = msrs(c("classif.tpr", "classif.tnr")),
terminator = trm("evals", n_evals = 30))
fselector = mlr3fselect::fs("random_search")
fselector$optimize(instance)
Also, I was wondering if it was possible to manually nest the two processes? If so, how?
I thought of creating all feature combinations like this:
tested_features <- unique(expand.grid(V1 = c(TRUE, FALSE), V2 = c(TRUE, FALSE), V3 = c(TRUE, FALSE), V4 = c(TRUE, FALSE), V5 = c(TRUE, FALSE), V6 = c(TRUE, FALSE), V7 = c(TRUE, FALSE)))
And test each feature combination from tsk("sonar")
in the task
argument of the ti()
function. But I don’t think that’s correct.