I noticed that group_bootstraps
is fairly slow compared to bootstraps
.
As a workaround I am doing nesting and unnesting which is a lot faster.
Are version 1 and version 2 equivilant or am I missing sth?
library(tidyr)
library(rsample)
dat <- tibble(x = rep(1:1000, 2), y = 1:2000)
f <- function(df, column){
tibble(
"estimate" = mean(pull(df, {{column}})),
"term" = "mean"
)
}
# Version 1 using rsample::group_bootstraps
start_time <- Sys.time()
dat %>%
group_bootstraps(group = x, times = 10) %>%
mutate(mean_stats = purrr::map(splits, ~ f(analysis(.), y))) %>%
int_pctl(mean_stats)
print(difftime(Sys.time(), start_time, units = "secs"))
# Version 2 using tidyr nest and unnest
start_time <- Sys.time()
dat %>%
nest(.by = x) %>%
bootstraps(times = 10) %>%
mutate(mean_stats = purrr::map(splits, ~ f(unnest(analysis(.), data), y))) %>%
int_pctl(mean_stats)
print(difftime(Sys.time(), start_time, units = "secs"))
- version 1 takes more than 9 seconds on my machine
- version 2 takes less than 0.5 seconds on my machine