I’ve coded a graph that performs stacking using the mlr3
package. The original code can be found here using a reproducible example. In summary, in a first step, I tuned the parameters of the level 0 learners, and in a final step, I used the predictions from the tuned level 0 learners to obtain the predictions of an ensemble learner (i.e., the averaged predictions of the level 0 learners). For the final step, I used mlr3pipelines::LearnerClassifAvg
as follows:
learner_avg <- mlr3pipelines::LearnerClassifAvg$new(id = "classif.avg")
learner_avg$predict_type <- "prob"
My actual data is spatial data, so I used the resampling method below to tune the level 0 learners (thus step 1):
inner_resampling <- mlr3::rsmp("repeated_sptcv_cstf", folds = 10, repeats = 100)
I thought I could use this resampling method for the final step, but it doesn’t work. For example, the command line below doesn’t work. Only “cv” or “insample” can be used.
po_learner_glmnet <- mlr3pipelines::po("learner_cv", learner = tuned_learner_glmnet, resampling.method = "sptcv_cstf")
I think that the difference between the resampling methods used in the level 0 (“sptcv_cstf”) and level 1 (“cv”) could pose an issue. The cross-validated predictions at the level 1 should be obtained using a spatial resampling method to be consistent. Is there a solution to this problem?
If needed, here’s an example of how I constructed the stacking pipelines:
po_learner_glmnet <- mlr3pipelines::po("learner_cv", learner = tuned_learner_glmnet)
po_learner_rpart <- mlr3pipelines::po("learner_cv", learner = tuned_learner_rpart)
graph_level_0 <- mlr3pipelines::gunion(list(po_learner_glmnet_cv, po_learner_rpart_cv)) %>>%
mlr3pipelines::po("featureunion")
graph_levels_0_and_1 <- graph_level_0 %>>% learner_avg
learner_graph_levels_0_and_1 <- mlr3::as_learner(graph_levels_0_and_1)
learner_graph_levels_0_and_1$train(task_sp)