I have a loop in R language that involves many complex operations, and it is slow. Basically, the pseudocode for it is like this.
set.seed(123)
n_rows <- 100000
n_cols <- 20000
# large memery require
dt1 <- as.data.table(matrix(runif(n_rows * n_cols), nrow = n_rows))
dt2 <- as.data.table(matrix(runif(n_rows * n_cols), nrow = n_rows))
dt3 <- as.data.table(matrix(runif(n_rows * n_cols), nrow = n_rows))
dim(dt1)
# run on first 100 row here
sapply(1:100, function(x) {
# just a toy run
cat("x =", x, "n")
a1 <- dt1[x,]
a2 <- dt2[x,]
a3 <- dt3[x,]
x1 <- a1 + a2 + a3
x2 <- sum(x1) + sum(a1[x>0])
x3 <- x2 + sum(a2[x>0])
return(x3)
})
Its very slowly, so I run profvis to find time-consuming steps.
library(profvis)
profvis({
sapply(1:100, function(x) {
# just a toy run
cat("x =", x, "n")
a1 <- dt1[x,]
a2 <- dt2[x,]
a3 <- dt3[x,]
x1 <- a1 + a2 + a3
x2 <- sum(x1) + sum(a1[x>0])
x3 <- x2 + sum(a2[x>0])
return(x3)
})
})
It seems that selecting rows using data.table is taking a significant amount of time.
I have tried:
- use data.table
From: Fast way to select rows within table in R?