this is my first post on stackoverflow, so please let me know if I need to improve something on my post.
In macOS (Apple M1 Max, 10 cores, 64 GB of RAM), I try to run NbClust in R on a small matrix. It works fine when run on a single core; it does not use a lot of memory (as expected) and CPU usage stays below 100% for the one core it is run on (so NbClust should not have internal multi-processing that might cause multi-multi-processing issues in conjunction with bplapply).
However, when I try to run it in parallel over multiple cores (using fork-ing, as on mac), memory quickly increases up to around 30-60 GB.
Here is a minimal example.
The first part with workers = 1
works fine.
The second part bloats memory excessively.
Any ideas why?
library(BiocParallel)
library(NbClust)
# Create a matrix of random positive numbers
mat <- matrix(runif(30 * 100, min = 0, max = 100),
nrow = 30, ncol = 100)
indeces <- c("kl", "ch", "hartigan",
"cindex", "db", "silhouette", "duda", "pseudot2",
"ratkowsky", "ball", "ptbiserial", "gap",
"mcclain", "gamma", "gplus", "tau", "dunn",
"sdindex", "sdbw")
### Works fine ##############################################
param <- BiocParallel::MulticoreParam(workers = 1,
progressbar = TRUE)
test <- BiocParallel::bplapply(
X = 1:10,
BPPARAM = param,
function(i) {
clust_results <- sapply(indeces, function(x) {
try({
NbClust::NbClust(data = mat, min.nc = 1, max.nc = 5,
method = "ward.D2", index = x)
}, silent = TRUE)
})
}
)
### Excessive memory consumption ##############################
param <- BiocParallel::MulticoreParam(workers = 8,
progressbar = TRUE)
test <- BiocParallel::bplapply(
X = 1:10,
BPPARAM = param,
function(i) {
clust_results <- sapply(indeces, function(x) {
try({
NbClust::NbClust(data = mat, min.nc = 1, max.nc = 5,
method = "ward.D2", index = x)
}, silent = TRUE)
})
}
)
Christian Halter is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.