I have written a function to that uses a numerical 2D-array (dimension n x m) as input and calculates the cross product of all row permutations of this input array. The output is a 2D-array (dimension n x n). This function becomes increasingly slow as I scale n and m.
Further background that might be useful:
I am running a procedure that needs to be performed in batches due to memory constraints. I want to track certain metrics for each batch and then calculate statistics accross all batches. In particular, the correlation of my row variables. To do this, for each batch I calculate the sum xi yi terms (this is structured as a n x n matrix) from the numerator of the pearsons correlation coefficient formula. I then keep track of this total across all batches and calculate the total correlation at the end the final batch.
I have included the sample code below:
`
input_data <- array(runif(1000, 0,1), dim = c(25,40))
calc_prod_matrix <- function(data){
mat <- matrix(NA, nrow = nrow(data), ncol = nrow(data))
mat[] <- array(mapply(function(i,j) crossprod(data[i,],data[j,]), i = row(mat), j = col(mat)),dim = dim(mat))
return(mat)
}
output <- calc_prod_matrix(input_data)
`
Is there a way to speed up this code or a different more efficient way of doing it?