I am trying to use foreach
to parallelize some matrix calculations. However, I found it gives similar performance to apply
and for
loop, despite I used 10 cores.
library(microbenchmark)
library(foreach)
library(doParallel)
mat = matrix(rnorm(3000 * 3000), 3000) # 3000 X 3000
g = rnorm(3000) # 3000
# use 10 cores
cl <- makeCluster(10)
registerDoParallel(cl)
microbenchmark(
# multi-core with foreach
foreach(i = 1:100, .combine = c) %dopar% {t(g) %*% mat %*% g},
# sapply
sapply(1:100, function(i){t(g) %*% mat %*% g}),
# for loop
for(i in 1:100){t(g) %*% mat %*% g},
times = 20)
stopCluster(cl)
On the other hand, if I use a different function (Sys.sleep()
), foreach
indeed can be ~10X faster than apply
and for
loop.
cl <- makeCluster(10)
registerDoParallel(cl)
microbenchmark(
# multi-core with foreach
foreach(i = 1:100, .combine = c) %dopar% {Sys.sleep(0.01) },
# sapply
sapply(1:100, function(i){Sys.sleep(0.01)}),
# for loop
for(i in 1:100){Sys.sleep(0.01)},
times = 20)
stopCluster(cl)
What is the reason, and how can I improve the performance for matrix calculation?