I have a vector in which elements are assigned as either 1 or 0
For example
V1 <- c(1,0,1,1,0)
I want to generate a matrix M of size V1 x V1 in which each element is coded 1 if the values of the vector elements are equal e.g. if V1(1)=1 & V1(3)=1 then M(1,3)=1
In the case above
M = (1,0,1,1,0
0,1,0,0,1
1,0,1,1,0
1,0,1,1,0
0,1,0,0,1)
My actual matrix will have 44 elements and a will simulate it around 10,000 times so I need code that runs quickly.
Thanks
Paul
I could code this in a for loop but this would be very inefficient
Paul Mee is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2
Try outer
+(outer(V1, V1, "=="))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 0
[2,] 0 1 0 0 1
[3,] 1 0 1 1 0
[4,] 1 0 1 1 0
[5,] 0 1 0 0 1
outer(V1, V1, FUN = (x, y) as.integer(x == y))
Benchmarks (assuming 44 elements in the input vector):
V2 <- sample(V1, size = 44, replace = TRUE)
bench::mark(
Andre = +(outer(V2, V2, "==")),
SB = outer(V2, V2, FUN = (x, y) as.integer(x == y)),
Sam = createMatrix(V2),
Abin = create_matrix(V2)
)
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <bch:t> <bch:> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 Andre 18.1µs 20µs 38110. 45.78KB 38.1 9990 10 262ms <int[…]> <Rprofmem> <bench_tm> <tibble>
2 SB 16.6µs 18.4µs 47984. 45.78KB 52.8 9989 11 208ms <int[…]> <Rprofmem> <bench_tm> <tibble>
3 Sam 4µs 4.6µs 204136. 7.61KB 20.4 9999 1 49ms <int[…]> <Rprofmem> <bench_tm> <tibble>
4 Abin 21.8µs 24.5µs 39049. 68.52KB 62.6 9984 16 256ms <dbl[…]> <Rprofmem> <bench_tm> <tibble>
As you want efficiency, here is an Rcpp
solution. I tried getting fancy and using bitwise operations as x[i] == x[j]
is equivalent to !(x[i] ^ x[j])
if all values are zeroes and ones, but it wasn’t faster than direct comparison. In any case, this may be quicker than outer()
. It returns an integer matrix.
Rcpp::sourceCpp(code = "
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerMatrix createMatrix(NumericVector x) {
int n = x.size();
IntegerMatrix M(n, n);
for(int i = 0; i < n; i++) {
for(int j = 0; j < n; j++) {
M(i, j) = (x[i] == x[j]) ? 1 : 0;
}
}
return M;
}
")
createMatrix(V1)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 0 1 1 0
# [2,] 0 1 0 0 1
# [3,] 1 0 1 1 0
# [4,] 1 0 1 1 0
# [5,] 0 1 0 0 1
I recently had to do something similar for a project. 🙂 So, you can approach it in 2 steps basically. First is to have a vectorised way to generate the matrix. And the next step is of course the simulation part.
For creating the matrix, you can use outer
function in combination with matrix
function. Code below:
V1 <- c(1, 0, 1, 1, 0)
create_matrix <- function(V) {
M <- outer(V, V, FUN = "==")
M <- as.numeric(M)
matrix(M, nrow = length(V), ncol = length(V))
}
M <- create_matrix(V1)
And then for the simulation part, you can use replicate
function. Code below:
matrices_10000 <- replicate(10000, create_matrix(V1), simplify = FALSE)
The key part in this is to use vectorisation as much as possible. About outer
, you can find more info about it here: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/outer
Two fast options (see benchmarks below).
A fast base solution:
+(tcrossprod(V1 + 1) != 2)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 0 1 1 0
#> [2,] 0 1 0 0 1
#> [3,] 1 0 1 1 0
#> [4,] 1 0 1 1 0
#> [5,] 0 1 0 0 1
A fast Rcpp solution:
Rcpp::sourceCpp(code = "
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerMatrix v2mat(const IntegerVector& x) {
const int n = x.size();
IntegerVector v0 (n);
IntegerVector v1 (n);
int n0 = 0;
int n1 = 0;
IntegerMatrix M = IntegerMatrix::diag(n, 1);
for(int i = 0; i < 3; i++) {
if (x(i) == 1) {
for (int j = 0; j < n1; j++) {
M(i, v1(j)) = 1;
M(v1(j), i) = 1;
}
v1(n1++) = i;
} else {
for (int j = 0; j < n0; j++) {
M(i, v0(j)) = 1;
M(v0(j), i) = 1;
}
v0(n0++) = i;
}
}
return M;
}
")
Benchmarking against a couple of the other answers:
V1 <- sample(0:1, 44, 1)
bench::mark(
outer = +(outer(V1, V1, "==")),
tcrossprod = +(tcrossprod(V1 + 1) != 2),
createMatrix = createMatrix(V1),
v2mat = v2mat(V1)
)
bench::mark(
outer = +(outer(V1, V1, "==")),
tcrossprod = +(tcrossprod(V1 + 1) != 2),
createMatrix = createMatrix(V1),
v2mat = v2mat(V1)
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 outer 12.7µs 19.6µs 49472. 62.9KB 24.7
#> 2 tcrossprod 6.6µs 13.8µs 71375. 30.8KB 35.7
#> 3 createMatrix 5.6µs 5.8µs 165304. 8KB 33.1
#> 4 v2mat 4.3µs 4.5µs 210224. 12.2KB 42.1