I am using mclapply to run a several linear regressions. Each child process from mclapply takes a dataset from the global environment, subsets that data, and then performs the regression on that subset of data.
I am using glmer(), part of the lme4 package, and I am also using a function called allFit(), where I can use several different optimizers to achieve convergence of the model.
My data is subset within the function and then, in the same function, I run the regression analysis with allFit().
The problem is that allFit()doesn’t work well with mclapply. For some reason, even if I explicitly pass my data to glmer(), allFit() and glmer() can’t locate the subsetted data. The only way I get get allFit() to work is to use the <<- operator to assign my data to the global environment. However, my worry is that perhaps allFit() could pass the wrong set of data to the model, since <<- assigns the data to the global environment.
I know what the environment for each child process of mclapply is cloned into a new environment, but I’m not sure if something assigned via the <<- operator stays in the child environment, or is it sent back to the parent environment? Does anyone know about this?
Thank you.
Here’s an example of how my code looks:
run_regressions <- function(species, phenotype, data){
# subset data
data_to_model <-
data %>%
select(all_of(species), all_of(phenotype), Timepoint_categorical, dna_conc, NEXT_ID, BATCH_NUMBER, clean_reads_FQ_1) %>%
filter(!if_any(everything(), is.na))
# assign data to global environment
data_to_model <<- data_to_model
# Run Model 1
m1 <- list()
diff_optims <- lme4::allFit(glmer(as.formula(paste(species, "~", phenotype, "+ Timepoint_categorical + dna_conc + clean_reads_FQ_1 + BATCH_NUMBER + (1|NEXT_ID)")), # try all optimizers
data = data_to_model,
family = binomial,
control=glmerControl(optCtrl=list(maxfun=10000))),
verbose=TRUE)
diff_optims_OK <- diff_optims[sapply(diff_optims, is, "merMod")] # is(), a function to see if an object inherits from a specific class, in this case"merMod".
lapply(diff_optims_OK, function(x) x@optinfo$conv$lme4$messages)
convergence_results <- lapply(diff_optims_OK, function(x) x@optinfo$conv$lme4$messages)
working_indices <- sapply(convergence_results, is.null)
if(sum(working_indices) == 0){
print("No algorithms from allFit converged. You may still be able to use the results, but proceed with extreme caution.")
m1$model <- NULL
m1$success <- FALSE
} else {
m1$model <- diff_optims[working_indices][[1]] # take the first fit
m1$success <- TRUE
}
# Run Model 2
m2 <- list()
diff_optims <- lme4::allFit(glmer(as.formula(paste(species, "~ + Timepoint_categorical + dna_conc + clean_reads_FQ_1 + BATCH_NUMBER + (1|NEXT_ID)")),
data = data_to_model,
family = binomial,
control=glmerControl(optCtrl=list(maxfun=10000))),
verbose=TRUE)
diff_optims_OK <- diff_optims[sapply(diff_optims, is, "merMod")] # is(), a function to see if an object inherits from a specific class, in this case"merMod".
convergence_results <- lapply(diff_optims_OK, function(x) x@optinfo$conv$lme4$messages)
working_indices <- sapply(convergence_results, is.null)
if(sum(working_indices) == 0){
print("No algorithms from allFit converged. You may still be able to use the results, but proceed with extreme caution.")
m2$model <- NULL
m2$success <- FALSE
} else {
m2$model <- diff_optims[working_indices][[1]] # take the first fit
m2$success <- TRUE
}
model_results = list(m1=m1, m2=m2, model_comparison=model_comparison, model_comparison_tidied=model_comparison_tidied)
}
# create combinations of species and phenotypes to feed to mclapply
combinations <- expand.grid(species = species_of_interest, phenotype = phenotypes_to_test)
# Initialize a list to store results
results_list <- list()
# execute in parallel with mclapply
results_list <-
mclapply(seq_len(nrow(combinations)), function(..i) {
species <- combinations[..i,"species"]
phenotype <- combinations[..i,"phenotype"]
tryCatch({
run_log_mod(species = species, phenotype = phenotype, data = data)
}, error = function(e) {
message(paste("Error in iteration", ..i, ":", e$message))
return(NULL) # or any other default value you want to return in case of an error
})
}, mc.cores = 6)