I want to use parallel computations in R. When running on the global environment (outside of functions), the behaviour was as expected: i.e. only exported objects are seen by the e.g., parSapply
function.
In the following, the variable V2
was not exported, and then the second parSapply
call failed.
V1 <- 4
V2 <- 7
cl <- snow::makeCluster(2)
snow::clusterExport(cl,'V1', envir = environment())
x <- snow::parSapply(cl, 1, function(x) paste("value = ", V1))
print(x)
# V2 was not exported
# expected error!
x <- snow::parSapply(cl, 1, function(x) paste("value = ", V2))
# Error in checkForRemoteErrors(val) :
# one node produced an error: object 'V2' not found
print(x)
snow::stopCluster(cl)
However, when working within a function, all objects in the function are exported and accessible to the parSapply
function.
testfunc3 <- function(){
V1 <- 4
V2 <- 7
cl <- snow::makeCluster(2)
snow::clusterExport(cl,'V1', envir = environment())
print(unlist(snow::clusterEvalQ(cl, {ls()})))
x <- snow::parSapply(cl, 1, function(x) paste("value = ", V1))
print(x)
# I expect an error if V2 not exported
x <- snow::parSapply(cl, 1, function(x) paste("value = ", V2))
print(x)
snow::stopCluster(cl)
return(invisible(NULL))
}
testfunc3()
# [1] "V1" "V1"
# [1] "value = 4"
# [1] "value = 7"
This line print(unlist(snow::clusterEvalQ(cl, {ls()})))
shows that only v1 is available in the two cores used, but I can still use the variable V2. Is this behaviour expected?
The problem here is that when the objects in the function are very large, all the variables are distributed into cores which 1) not needed; 2) take time to distribute them to the cores; 3) the function may crash because of memory limit.
Is there a way to explicitly allow certain objects to be distributed to the processing cores?
1