Is there a faster way to find the first value that is not NA in a large vector using base R?

Just like the question says. Is there a faster way to do what is done below when the vector size is very large (> 10M entries) using base R in the general case?

The code below works, but it does not short-circuit, so when the vector size grows it becomes slow for what should be obvious reasons. In this particular example a for loop might be faster but if the first NA value is very far from the start of the vector maybe not…

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>set.seed(1)
x <- c(rep(NA, 3), sample(c(T,F), size=5e7, replace=T))
min(which(!is.na(x))) #4
</code>
<code>set.seed(1) x <- c(rep(NA, 3), sample(c(T,F), size=5e7, replace=T)) min(which(!is.na(x))) #4 </code>
set.seed(1)
x <- c(rep(NA, 3), sample(c(T,F), size=5e7, replace=T))
min(which(!is.na(x))) #4

6

An Rcpp for loop should be fast even if the first non-NA value is not particularly near the start. The basic idea is that you do not need to keep iterating after you have found the first value.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>Rcpp::cppFunction("int which_non_na(LogicalVector x) {
R_xlen_t n = x.size();
for (R_xlen_t i = 0; i < n; ++i) {
if (!LogicalVector::is_na(x[i])) {
return i + 1; // +1 for 1-indexing in R
}
}
return NA_INTEGER; // if no non-NA values are found
}")
which_non_na(x) # 4
</code>
<code>Rcpp::cppFunction("int which_non_na(LogicalVector x) { R_xlen_t n = x.size(); for (R_xlen_t i = 0; i < n; ++i) { if (!LogicalVector::is_na(x[i])) { return i + 1; // +1 for 1-indexing in R } } return NA_INTEGER; // if no non-NA values are found }") which_non_na(x) # 4 </code>
Rcpp::cppFunction("int which_non_na(LogicalVector x) {
    R_xlen_t n = x.size();
    for (R_xlen_t i = 0; i < n; ++i) {
        if (!LogicalVector::is_na(x[i])) {
            return i + 1; // +1 for 1-indexing in R
        }
    }
    return NA_INTEGER; // if no non-NA values are found
}")

which_non_na(x) # 4

Benchmarks

Here are some benchmarks for some vectors of length 1e4 to 1e8 and between 1e3 and 1e6 non-NA values. With a vector of length 10,000 and 1,000 non-NA values it is around 50 times faster than the base R approach in the question. For the shortest vectors, the approach is around the same speed (or even slightly slower than) the pure R nonNA() function in the answer by Carl Witthoft. We see the benefits with the longest vectors and the most non-NA values (so we can exit earlier).

With a vector of length 100 million and 1 million non-NA values, the median time of the Rcpp approach is around 600,000 times faster than the approach in the question. The pure R nonNA() function falls further behind Rcpp (although it’s still decent and only 7 times as slow). The pure R min(which.min(x), which.max(x), Inf) approach by ThomasIsCoding is pretty close with a relative speed of 1.86 compared with Rcpp.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>set.seed(1)
nonNA <- (x) { for (jn in 1:length(x)) { if (!is.na(x[jn])) { return(jn) } }; NA }
results <- bench::press(
vec_length = 10^(4:8),
num_non_na = c(1e3, 1e5, 1e6),
{
# create vector with non-NA
if (num_non_na >= vec_length) num_non_na <- vec_length - 1
x <- sample(c(rep(c(TRUE, FALSE), length.out = num_non_na), rep(NA, vec_length - num_non_na)))
bench::mark(
relative = TRUE,
base_r = min(which(!is.na(x))),
which_min = min(which.min(x), which.max(x), Inf),
which_1 = which(!is.na(x))[1],
position = Position((x) !is.na(x), x),
non_na = nonNA(x),
rcpp = which_non_na(x)
)
}
)
</code>
<code>set.seed(1) nonNA <- (x) { for (jn in 1:length(x)) { if (!is.na(x[jn])) { return(jn) } }; NA } results <- bench::press( vec_length = 10^(4:8), num_non_na = c(1e3, 1e5, 1e6), { # create vector with non-NA if (num_non_na >= vec_length) num_non_na <- vec_length - 1 x <- sample(c(rep(c(TRUE, FALSE), length.out = num_non_na), rep(NA, vec_length - num_non_na))) bench::mark( relative = TRUE, base_r = min(which(!is.na(x))), which_min = min(which.min(x), which.max(x), Inf), which_1 = which(!is.na(x))[1], position = Position((x) !is.na(x), x), non_na = nonNA(x), rcpp = which_non_na(x) ) } ) </code>
set.seed(1)
nonNA <- (x) { for (jn in 1:length(x)) { if (!is.na(x[jn])) { return(jn) } }; NA }
results <- bench::press(
    vec_length = 10^(4:8),
    num_non_na = c(1e3, 1e5, 1e6),
    {
        # create vector with non-NA
        if (num_non_na >= vec_length) num_non_na <- vec_length - 1
        x <- sample(c(rep(c(TRUE, FALSE), length.out = num_non_na), rep(NA, vec_length - num_non_na)))

        bench::mark(
            relative = TRUE,
            base_r = min(which(!is.na(x))),
            which_min = min(which.min(x), which.max(x), Inf),
            which_1 = which(!is.na(x))[1],
            position = Position((x) !is.na(x), x),
            non_na = nonNA(x),
            rcpp = which_non_na(x)
        )
    }
)

Output:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code> expression vec_length num_non_na min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <bch:tm>
1 base_r 10000 1000 54.2 57.8 1 Inf Inf 9978 22 193.91ms
2 which_min 10000 1000 1.63 1.66 36.6 NaN NaN 10000 0 5.31ms
3 which_1 10000 1000 53.0 55.9 1.16 Inf Inf 9977 23 167.24ms
4 position 10000 1000 2.86 3.11 20.0 NaN NaN 10000 0 9.71ms
5 non_na 10000 1000 1 1 51.3 Inf NaN 10000 0 3.79ms
6 rcpp 10000 1000 1.06 1.07 47.3 NaN NaN 10000 0 4.11ms
7 base_r 100000 1000 326. 353. 1.00 Inf Inf 3449 83 397.84ms
8 which_min 100000 1000 1.73 1.73 200. NaN NaN 10000 0 5.79ms
9 which_1 100000 1000 315. 358. 1 Inf Inf 3366 82 389.22ms
10 position 100000 1000 7.68 8.26 41.4 NaN Inf 9999 1 27.96ms
11 non_na 100000 1000 2.35 2.52 137. NaN NaN 10000 0 8.42ms
12 rcpp 100000 1000 1 1 345. NaN NaN 10000 0 3.35ms
13 base_r 1000000 1000 1257. 1514. 1.00 Inf Inf 223 82 302.37ms
14 which_min 1000000 1000 2.76 2.73 629. NaN NaN 10000 0 21.6ms
15 which_1 1000000 1000 1278. 1491. 1 Inf Inf 226 76 306.97ms
16 position 1000000 1000 634. 643. 2.68 NaN Inf 917 23 463.92ms
17 non_na 1000000 1000 134. 143. 12.0 NaN Inf 3787 41 429.6ms
18 rcpp 1000000 1000 1 1 1720. NaN NaN 10000 0 7.9ms
19 base_r 10000000 1000 10613. 11683. 1 Inf Inf 25 25 518.79ms
20 which_min 10000000 1000 3.23 3.23 3493. NaN NaN 10000 0 59.41ms
21 which_1 10000000 1000 10445. 10436. 1.10 Inf Inf 27 27 510.68ms
22 position 10000000 1000 882. 921. 11.1 NaN Inf 269 22 500.75ms
23 non_na 10000000 1000 189. 205. 49.8 NaN Inf 1201 42 500.07ms
24 rcpp 10000000 1000 1 1 11100. NaN NaN 10000 0 18.7ms
25 base_r 100000000 1000 39833. 39765. 1.01 Inf Inf 3 6 552.89ms
26 which_min 100000000 1000 10.3 10.3 3955. NaN NaN 10000 0 468.35ms
27 which_1 100000000 1000 40767. 40599. 1 Inf Inf 3 6 555.68ms
28 position 100000000 1000 1020. 1095. 34.2 NaN Inf 93 21 503.13ms
29 non_na 100000000 1000 211. 230. 164. NaN Inf 444 43 500.34ms
30 rcpp 100000000 1000 1 1 40816. NaN NaN 10000 0 45.38ms
31 base_r 10000 100000 57.8 74.9 1 Inf Inf 9998 2 295.12ms
32 which_min 10000 100000 1.53 1.54 60.2 NaN NaN 10000 0 4.9ms
33 which_1 10000 100000 47.3 50.9 1.93 Inf Inf 9998 2 153ms
34 position 10000 100000 2.85 2.97 32.3 NaN NaN 10000 0 9.13ms
35 non_na 10000 100000 1 1 92.9 NaN NaN 10000 0 3.18ms
36 rcpp 10000 100000 1.06 1.07 85.3 NaN NaN 10000 0 3.46ms
37 base_r 100000 100000 573. 593. 1 Inf Inf 2640 5 489.41ms
38 which_min 100000 100000 1.55 1.55 379. NaN NaN 10000 0 4.9ms
39 which_1 100000 100000 453. 479. 1.31 Inf Inf 3453 7 488.38ms
40 position 100000 100000 2.87 2.98 202. NaN NaN 10000 0 9.16ms
41 non_na 100000 100000 1 1 578. NaN NaN 10000 0 3.21ms
42 rcpp 100000 100000 1.06 1.06 578. NaN NaN 10000 0 3.21ms
43 base_r 1000000 100000 5830. 6048. 1 Inf Inf 244 5 482.78ms
44 which_min 1000000 100000 1.48 1.49 3903. NaN NaN 10000 0 5.07ms
45 which_1 1000000 100000 5717. 5937. 1.06 Inf Inf 261 4 487.61ms
46 position 1000000 100000 13.0 13.3 463. NaN Inf 9998 2 42.69ms
47 non_na 1000000 100000 3.48 3.65 1660. NaN NaN 10000 0 11.92ms
48 rcpp 1000000 100000 1 1 5784. NaN Inf 9999 1 3.42ms
49 base_r 10000000 100000 40802. 40182. 1 Inf Inf 24 7 367.53ms
50 which_min 10000000 100000 1.85 1.83 22655. NaN NaN 10000 0 6.76ms
51 which_1 10000000 100000 40334. 40123. 1.02 Inf Inf 23 10 344.07ms
52 position 10000000 100000 92.3 91.3 471. NaN Inf 9984 16 324.88ms
53 non_na 10000000 100000 20.3 21.1 2043. NaN Inf 9993 7 74.92ms
54 rcpp 10000000 100000 1 1 42212. NaN NaN 10000 0 3.63ms
55 base_r 100000000 100000 374396. 366226. 1.05 Inf Inf 3 6 548.73ms
56 which_min 100000000 100000 4.40 4.35 88633. NaN NaN 10000 0 21.62ms
57 which_1 100000000 100000 373680. 366947. 1 Inf Inf 3 6 574.97ms
58 position 100000000 100000 359. 364. 1029. NaN Inf 2685 23 500.11ms
59 non_na 100000000 100000 75.7 80.2 4240. NaN Inf 10000 38 452.07ms
60 rcpp 100000000 100000 1 1 387345. NaN NaN 10000 0 4.95ms
61 base_r 10000 1000000 59.1 63.5 1 Inf Inf 9998 2 210.55ms
62 which_min 10000 1000000 1.55 1.55 42.9 NaN NaN 10000 0 4.91ms
63 which_1 10000 1000000 45.6 50.6 1.42 Inf Inf 9998 2 148.62ms
64 position 10000 1000000 2.85 2.94 23.3 NaN NaN 10000 0 9.03ms
65 non_na 10000 1000000 1 1 66.8 NaN NaN 10000 0 3.15ms
66 rcpp 10000 1000000 1.06 1.06 64.7 NaN NaN 10000 0 3.25ms
67 base_r 100000 1000000 538. 562. 1 Inf Inf 2801 6 489.45ms
68 which_min 100000 1000000 1.46 1.55 247. NaN NaN 10000 0 7.08ms
69 which_1 100000 1000000 432. 456. 1.22 Inf Inf 3404 7 488.07ms
70 position 100000 1000000 2.71 2.91 185. NaN NaN 10000 0 9.44ms
71 non_na 100000 1000000 1 1 520. NaN NaN 10000 0 3.36ms
72 rcpp 100000 1000000 1.06 1.07 504. NaN NaN 10000 0 3.47ms
73 base_r 1000000 1000000 5571. 5799. 1 Inf Inf 260 6 481.33ms
74 which_min 1000000 1000000 1.54 1.52 3729. NaN NaN 10000 0 4.96ms
75 which_1 1000000 1000000 4557. 4637. 1.29 Inf Inf 336 7 481.23ms
76 position 1000000 1000000 2.84 2.89 2004. NaN NaN 10000 0 9.24ms
77 non_na 1000000 1000000 1 1 5494. NaN NaN 10000 0 3.37ms
78 rcpp 1000000 1000000 1.06 1.02 5732. NaN NaN 10000 0 3.23ms
79 base_r 10000000 1000000 69156. 69066. 1 Inf Inf 17 5 377.72ms
80 which_min 10000000 1000000 1.46 1.47 43589. NaN NaN 10000 0 5.1ms
81 which_1 10000000 1000000 67884. 66517. 1.05 Inf Inf 19 5 400.27ms
82 position 10000000 1000000 4.03 4.16 15632. NaN NaN 10000 0 14.21ms
83 non_na 10000000 1000000 1.32 1.31 44276. NaN NaN 10000 0 5.02ms
84 rcpp 10000000 1000000 1 1 66408. NaN Inf 9999 1 3.35ms
85 base_r 100000000 1000000 662166. 619018. 1.04 Inf Inf 3 4 635.95ms
86 which_min 100000000 1000000 1.90 1.86 326808. NaN NaN 10000 0 6.72ms
87 which_1 100000000 1000000 589090. 557688. 1 Inf Inf 3 3 659.17ms
88 position 100000000 1000000 31.4 31.0 19264. NaN Inf 10000 4 114.06ms
89 non_na 100000000 1000000 7.14 7.59 74081. NaN Inf 10000 2 29.66ms
90 rcpp 100000000 1000000 1 1 62977. NaN Inf 10000 1 34.89ms
</code>
<code> expression vec_length num_non_na min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time <bch:expr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <bch:tm> 1 base_r 10000 1000 54.2 57.8 1 Inf Inf 9978 22 193.91ms 2 which_min 10000 1000 1.63 1.66 36.6 NaN NaN 10000 0 5.31ms 3 which_1 10000 1000 53.0 55.9 1.16 Inf Inf 9977 23 167.24ms 4 position 10000 1000 2.86 3.11 20.0 NaN NaN 10000 0 9.71ms 5 non_na 10000 1000 1 1 51.3 Inf NaN 10000 0 3.79ms 6 rcpp 10000 1000 1.06 1.07 47.3 NaN NaN 10000 0 4.11ms 7 base_r 100000 1000 326. 353. 1.00 Inf Inf 3449 83 397.84ms 8 which_min 100000 1000 1.73 1.73 200. NaN NaN 10000 0 5.79ms 9 which_1 100000 1000 315. 358. 1 Inf Inf 3366 82 389.22ms 10 position 100000 1000 7.68 8.26 41.4 NaN Inf 9999 1 27.96ms 11 non_na 100000 1000 2.35 2.52 137. NaN NaN 10000 0 8.42ms 12 rcpp 100000 1000 1 1 345. NaN NaN 10000 0 3.35ms 13 base_r 1000000 1000 1257. 1514. 1.00 Inf Inf 223 82 302.37ms 14 which_min 1000000 1000 2.76 2.73 629. NaN NaN 10000 0 21.6ms 15 which_1 1000000 1000 1278. 1491. 1 Inf Inf 226 76 306.97ms 16 position 1000000 1000 634. 643. 2.68 NaN Inf 917 23 463.92ms 17 non_na 1000000 1000 134. 143. 12.0 NaN Inf 3787 41 429.6ms 18 rcpp 1000000 1000 1 1 1720. NaN NaN 10000 0 7.9ms 19 base_r 10000000 1000 10613. 11683. 1 Inf Inf 25 25 518.79ms 20 which_min 10000000 1000 3.23 3.23 3493. NaN NaN 10000 0 59.41ms 21 which_1 10000000 1000 10445. 10436. 1.10 Inf Inf 27 27 510.68ms 22 position 10000000 1000 882. 921. 11.1 NaN Inf 269 22 500.75ms 23 non_na 10000000 1000 189. 205. 49.8 NaN Inf 1201 42 500.07ms 24 rcpp 10000000 1000 1 1 11100. NaN NaN 10000 0 18.7ms 25 base_r 100000000 1000 39833. 39765. 1.01 Inf Inf 3 6 552.89ms 26 which_min 100000000 1000 10.3 10.3 3955. NaN NaN 10000 0 468.35ms 27 which_1 100000000 1000 40767. 40599. 1 Inf Inf 3 6 555.68ms 28 position 100000000 1000 1020. 1095. 34.2 NaN Inf 93 21 503.13ms 29 non_na 100000000 1000 211. 230. 164. NaN Inf 444 43 500.34ms 30 rcpp 100000000 1000 1 1 40816. NaN NaN 10000 0 45.38ms 31 base_r 10000 100000 57.8 74.9 1 Inf Inf 9998 2 295.12ms 32 which_min 10000 100000 1.53 1.54 60.2 NaN NaN 10000 0 4.9ms 33 which_1 10000 100000 47.3 50.9 1.93 Inf Inf 9998 2 153ms 34 position 10000 100000 2.85 2.97 32.3 NaN NaN 10000 0 9.13ms 35 non_na 10000 100000 1 1 92.9 NaN NaN 10000 0 3.18ms 36 rcpp 10000 100000 1.06 1.07 85.3 NaN NaN 10000 0 3.46ms 37 base_r 100000 100000 573. 593. 1 Inf Inf 2640 5 489.41ms 38 which_min 100000 100000 1.55 1.55 379. NaN NaN 10000 0 4.9ms 39 which_1 100000 100000 453. 479. 1.31 Inf Inf 3453 7 488.38ms 40 position 100000 100000 2.87 2.98 202. NaN NaN 10000 0 9.16ms 41 non_na 100000 100000 1 1 578. NaN NaN 10000 0 3.21ms 42 rcpp 100000 100000 1.06 1.06 578. NaN NaN 10000 0 3.21ms 43 base_r 1000000 100000 5830. 6048. 1 Inf Inf 244 5 482.78ms 44 which_min 1000000 100000 1.48 1.49 3903. NaN NaN 10000 0 5.07ms 45 which_1 1000000 100000 5717. 5937. 1.06 Inf Inf 261 4 487.61ms 46 position 1000000 100000 13.0 13.3 463. NaN Inf 9998 2 42.69ms 47 non_na 1000000 100000 3.48 3.65 1660. NaN NaN 10000 0 11.92ms 48 rcpp 1000000 100000 1 1 5784. NaN Inf 9999 1 3.42ms 49 base_r 10000000 100000 40802. 40182. 1 Inf Inf 24 7 367.53ms 50 which_min 10000000 100000 1.85 1.83 22655. NaN NaN 10000 0 6.76ms 51 which_1 10000000 100000 40334. 40123. 1.02 Inf Inf 23 10 344.07ms 52 position 10000000 100000 92.3 91.3 471. NaN Inf 9984 16 324.88ms 53 non_na 10000000 100000 20.3 21.1 2043. NaN Inf 9993 7 74.92ms 54 rcpp 10000000 100000 1 1 42212. NaN NaN 10000 0 3.63ms 55 base_r 100000000 100000 374396. 366226. 1.05 Inf Inf 3 6 548.73ms 56 which_min 100000000 100000 4.40 4.35 88633. NaN NaN 10000 0 21.62ms 57 which_1 100000000 100000 373680. 366947. 1 Inf Inf 3 6 574.97ms 58 position 100000000 100000 359. 364. 1029. NaN Inf 2685 23 500.11ms 59 non_na 100000000 100000 75.7 80.2 4240. NaN Inf 10000 38 452.07ms 60 rcpp 100000000 100000 1 1 387345. NaN NaN 10000 0 4.95ms 61 base_r 10000 1000000 59.1 63.5 1 Inf Inf 9998 2 210.55ms 62 which_min 10000 1000000 1.55 1.55 42.9 NaN NaN 10000 0 4.91ms 63 which_1 10000 1000000 45.6 50.6 1.42 Inf Inf 9998 2 148.62ms 64 position 10000 1000000 2.85 2.94 23.3 NaN NaN 10000 0 9.03ms 65 non_na 10000 1000000 1 1 66.8 NaN NaN 10000 0 3.15ms 66 rcpp 10000 1000000 1.06 1.06 64.7 NaN NaN 10000 0 3.25ms 67 base_r 100000 1000000 538. 562. 1 Inf Inf 2801 6 489.45ms 68 which_min 100000 1000000 1.46 1.55 247. NaN NaN 10000 0 7.08ms 69 which_1 100000 1000000 432. 456. 1.22 Inf Inf 3404 7 488.07ms 70 position 100000 1000000 2.71 2.91 185. NaN NaN 10000 0 9.44ms 71 non_na 100000 1000000 1 1 520. NaN NaN 10000 0 3.36ms 72 rcpp 100000 1000000 1.06 1.07 504. NaN NaN 10000 0 3.47ms 73 base_r 1000000 1000000 5571. 5799. 1 Inf Inf 260 6 481.33ms 74 which_min 1000000 1000000 1.54 1.52 3729. NaN NaN 10000 0 4.96ms 75 which_1 1000000 1000000 4557. 4637. 1.29 Inf Inf 336 7 481.23ms 76 position 1000000 1000000 2.84 2.89 2004. NaN NaN 10000 0 9.24ms 77 non_na 1000000 1000000 1 1 5494. NaN NaN 10000 0 3.37ms 78 rcpp 1000000 1000000 1.06 1.02 5732. NaN NaN 10000 0 3.23ms 79 base_r 10000000 1000000 69156. 69066. 1 Inf Inf 17 5 377.72ms 80 which_min 10000000 1000000 1.46 1.47 43589. NaN NaN 10000 0 5.1ms 81 which_1 10000000 1000000 67884. 66517. 1.05 Inf Inf 19 5 400.27ms 82 position 10000000 1000000 4.03 4.16 15632. NaN NaN 10000 0 14.21ms 83 non_na 10000000 1000000 1.32 1.31 44276. NaN NaN 10000 0 5.02ms 84 rcpp 10000000 1000000 1 1 66408. NaN Inf 9999 1 3.35ms 85 base_r 100000000 1000000 662166. 619018. 1.04 Inf Inf 3 4 635.95ms 86 which_min 100000000 1000000 1.90 1.86 326808. NaN NaN 10000 0 6.72ms 87 which_1 100000000 1000000 589090. 557688. 1 Inf Inf 3 3 659.17ms 88 position 100000000 1000000 31.4 31.0 19264. NaN Inf 10000 4 114.06ms 89 non_na 100000000 1000000 7.14 7.59 74081. NaN Inf 10000 2 29.66ms 90 rcpp 100000000 1000000 1 1 62977. NaN Inf 10000 1 34.89ms </code>
   expression vec_length num_non_na       min    median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
   <bch:expr>      <dbl>      <dbl>     <dbl>     <dbl>     <dbl>     <dbl>    <dbl> <int> <dbl>   <bch:tm>
 1 base_r          10000       1000     54.2      57.8       1          Inf      Inf  9978    22   193.91ms
 2 which_min       10000       1000      1.63      1.66     36.6        NaN      NaN 10000     0     5.31ms
 3 which_1         10000       1000     53.0      55.9       1.16       Inf      Inf  9977    23   167.24ms
 4 position        10000       1000      2.86      3.11     20.0        NaN      NaN 10000     0     9.71ms
 5 non_na          10000       1000      1         1        51.3        Inf      NaN 10000     0     3.79ms
 6 rcpp            10000       1000      1.06      1.07     47.3        NaN      NaN 10000     0     4.11ms
 7 base_r         100000       1000    326.      353.        1.00       Inf      Inf  3449    83   397.84ms
 8 which_min      100000       1000      1.73      1.73    200.         NaN      NaN 10000     0     5.79ms
 9 which_1        100000       1000    315.      358.        1          Inf      Inf  3366    82   389.22ms
10 position       100000       1000      7.68      8.26     41.4        NaN      Inf  9999     1    27.96ms
11 non_na         100000       1000      2.35      2.52    137.         NaN      NaN 10000     0     8.42ms
12 rcpp           100000       1000      1         1       345.         NaN      NaN 10000     0     3.35ms
13 base_r        1000000       1000   1257.     1514.        1.00       Inf      Inf   223    82   302.37ms
14 which_min     1000000       1000      2.76      2.73    629.         NaN      NaN 10000     0     21.6ms
15 which_1       1000000       1000   1278.     1491.        1          Inf      Inf   226    76   306.97ms
16 position      1000000       1000    634.      643.        2.68       NaN      Inf   917    23   463.92ms
17 non_na        1000000       1000    134.      143.       12.0        NaN      Inf  3787    41    429.6ms
18 rcpp          1000000       1000      1         1      1720.         NaN      NaN 10000     0      7.9ms
19 base_r       10000000       1000  10613.    11683.        1          Inf      Inf    25    25   518.79ms
20 which_min    10000000       1000      3.23      3.23   3493.         NaN      NaN 10000     0    59.41ms
21 which_1      10000000       1000  10445.    10436.        1.10       Inf      Inf    27    27   510.68ms
22 position     10000000       1000    882.      921.       11.1        NaN      Inf   269    22   500.75ms
23 non_na       10000000       1000    189.      205.       49.8        NaN      Inf  1201    42   500.07ms
24 rcpp         10000000       1000      1         1     11100.         NaN      NaN 10000     0     18.7ms
25 base_r      100000000       1000  39833.    39765.        1.01       Inf      Inf     3     6   552.89ms
26 which_min   100000000       1000     10.3      10.3    3955.         NaN      NaN 10000     0   468.35ms
27 which_1     100000000       1000  40767.    40599.        1          Inf      Inf     3     6   555.68ms
28 position    100000000       1000   1020.     1095.       34.2        NaN      Inf    93    21   503.13ms
29 non_na      100000000       1000    211.      230.      164.         NaN      Inf   444    43   500.34ms
30 rcpp        100000000       1000      1         1     40816.         NaN      NaN 10000     0    45.38ms
31 base_r          10000     100000     57.8      74.9       1          Inf      Inf  9998     2   295.12ms
32 which_min       10000     100000      1.53      1.54     60.2        NaN      NaN 10000     0      4.9ms
33 which_1         10000     100000     47.3      50.9       1.93       Inf      Inf  9998     2      153ms
34 position        10000     100000      2.85      2.97     32.3        NaN      NaN 10000     0     9.13ms
35 non_na          10000     100000      1         1        92.9        NaN      NaN 10000     0     3.18ms
36 rcpp            10000     100000      1.06      1.07     85.3        NaN      NaN 10000     0     3.46ms
37 base_r         100000     100000    573.      593.        1          Inf      Inf  2640     5   489.41ms
38 which_min      100000     100000      1.55      1.55    379.         NaN      NaN 10000     0      4.9ms
39 which_1        100000     100000    453.      479.        1.31       Inf      Inf  3453     7   488.38ms
40 position       100000     100000      2.87      2.98    202.         NaN      NaN 10000     0     9.16ms
41 non_na         100000     100000      1         1       578.         NaN      NaN 10000     0     3.21ms
42 rcpp           100000     100000      1.06      1.06    578.         NaN      NaN 10000     0     3.21ms
43 base_r        1000000     100000   5830.     6048.        1          Inf      Inf   244     5   482.78ms
44 which_min     1000000     100000      1.48      1.49   3903.         NaN      NaN 10000     0     5.07ms
45 which_1       1000000     100000   5717.     5937.        1.06       Inf      Inf   261     4   487.61ms
46 position      1000000     100000     13.0      13.3     463.         NaN      Inf  9998     2    42.69ms
47 non_na        1000000     100000      3.48      3.65   1660.         NaN      NaN 10000     0    11.92ms
48 rcpp          1000000     100000      1         1      5784.         NaN      Inf  9999     1     3.42ms
49 base_r       10000000     100000  40802.    40182.        1          Inf      Inf    24     7   367.53ms
50 which_min    10000000     100000      1.85      1.83  22655.         NaN      NaN 10000     0     6.76ms
51 which_1      10000000     100000  40334.    40123.        1.02       Inf      Inf    23    10   344.07ms
52 position     10000000     100000     92.3      91.3     471.         NaN      Inf  9984    16   324.88ms
53 non_na       10000000     100000     20.3      21.1    2043.         NaN      Inf  9993     7    74.92ms
54 rcpp         10000000     100000      1         1     42212.         NaN      NaN 10000     0     3.63ms
55 base_r      100000000     100000 374396.   366226.        1.05       Inf      Inf     3     6   548.73ms
56 which_min   100000000     100000      4.40      4.35  88633.         NaN      NaN 10000     0    21.62ms
57 which_1     100000000     100000 373680.   366947.        1          Inf      Inf     3     6   574.97ms
58 position    100000000     100000    359.      364.     1029.         NaN      Inf  2685    23   500.11ms
59 non_na      100000000     100000     75.7      80.2    4240.         NaN      Inf 10000    38   452.07ms
60 rcpp        100000000     100000      1         1    387345.         NaN      NaN 10000     0     4.95ms
61 base_r          10000    1000000     59.1      63.5       1          Inf      Inf  9998     2   210.55ms
62 which_min       10000    1000000      1.55      1.55     42.9        NaN      NaN 10000     0     4.91ms
63 which_1         10000    1000000     45.6      50.6       1.42       Inf      Inf  9998     2   148.62ms
64 position        10000    1000000      2.85      2.94     23.3        NaN      NaN 10000     0     9.03ms
65 non_na          10000    1000000      1         1        66.8        NaN      NaN 10000     0     3.15ms
66 rcpp            10000    1000000      1.06      1.06     64.7        NaN      NaN 10000     0     3.25ms
67 base_r         100000    1000000    538.      562.        1          Inf      Inf  2801     6   489.45ms
68 which_min      100000    1000000      1.46      1.55    247.         NaN      NaN 10000     0     7.08ms
69 which_1        100000    1000000    432.      456.        1.22       Inf      Inf  3404     7   488.07ms
70 position       100000    1000000      2.71      2.91    185.         NaN      NaN 10000     0     9.44ms
71 non_na         100000    1000000      1         1       520.         NaN      NaN 10000     0     3.36ms
72 rcpp           100000    1000000      1.06      1.07    504.         NaN      NaN 10000     0     3.47ms
73 base_r        1000000    1000000   5571.     5799.        1          Inf      Inf   260     6   481.33ms
74 which_min     1000000    1000000      1.54      1.52   3729.         NaN      NaN 10000     0     4.96ms
75 which_1       1000000    1000000   4557.     4637.        1.29       Inf      Inf   336     7   481.23ms
76 position      1000000    1000000      2.84      2.89   2004.         NaN      NaN 10000     0     9.24ms
77 non_na        1000000    1000000      1         1      5494.         NaN      NaN 10000     0     3.37ms
78 rcpp          1000000    1000000      1.06      1.02   5732.         NaN      NaN 10000     0     3.23ms
79 base_r       10000000    1000000  69156.    69066.        1          Inf      Inf    17     5   377.72ms
80 which_min    10000000    1000000      1.46      1.47  43589.         NaN      NaN 10000     0      5.1ms
81 which_1      10000000    1000000  67884.    66517.        1.05       Inf      Inf    19     5   400.27ms
82 position     10000000    1000000      4.03      4.16  15632.         NaN      NaN 10000     0    14.21ms
83 non_na       10000000    1000000      1.32      1.31  44276.         NaN      NaN 10000     0     5.02ms
84 rcpp         10000000    1000000      1         1     66408.         NaN      Inf  9999     1     3.35ms
85 base_r      100000000    1000000 662166.   619018.        1.04       Inf      Inf     3     4   635.95ms
86 which_min   100000000    1000000      1.90      1.86 326808.         NaN      NaN 10000     0     6.72ms
87 which_1     100000000    1000000 589090.   557688.        1          Inf      Inf     3     3   659.17ms
88 position    100000000    1000000     31.4      31.0   19264.         NaN      Inf 10000     4   114.06ms
89 non_na      100000000    1000000      7.14      7.59  74081.         NaN      Inf 10000     2    29.66ms
90 rcpp        100000000    1000000      1         1     62977.         NaN      Inf 10000     1    34.89ms

Plot of results (log time scale):

13

You can use which.min and which.max (without is.na()), since as said in the manual:

Missing and NaN values are discarded.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>set.seed(1)
x <- c(rep(NA, 3), sample(c(T, F), size = 5e7, replace = T))
f1 <- () min(which(!is.na(x)))
f2 <- () which.min(is.na(x))
f3 <- () min(which.min(x), which.max(x), Inf)
microbenchmark(
f1 = f1(),
f2 = f2(),
f3 = f3(),
unit = "relative",
check = "equal"
)
</code>
<code>set.seed(1) x <- c(rep(NA, 3), sample(c(T, F), size = 5e7, replace = T)) f1 <- () min(which(!is.na(x))) f2 <- () which.min(is.na(x)) f3 <- () min(which.min(x), which.max(x), Inf) microbenchmark( f1 = f1(), f2 = f2(), f3 = f3(), unit = "relative", check = "equal" ) </code>
set.seed(1)
x <- c(rep(NA, 3), sample(c(T, F), size = 5e7, replace = T))

f1 <- () min(which(!is.na(x)))
f2 <- () which.min(is.na(x))
f3 <- () min(which.min(x), which.max(x), Inf)
microbenchmark(
  f1 = f1(),
  f2 = f2(),
  f3 = f3(),
  unit = "relative",
  check = "equal"
)

and you will see

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>Unit: relative
expr min lq mean median uq max neval
f1 931928.5 596652.0 12690.309 31796.371 33326.273 232.24601 100
f2 139656.2 98002.5 2770.152 5879.894 8057.321 84.39669 100
f3 1.0 1.0 1.000 1.000 1.000 1.00000 100
</code>
<code>Unit: relative expr min lq mean median uq max neval f1 931928.5 596652.0 12690.309 31796.371 33326.273 232.24601 100 f2 139656.2 98002.5 2770.152 5879.894 8057.321 84.39669 100 f3 1.0 1.0 1.000 1.000 1.000 1.00000 100 </code>
Unit: relative
 expr      min       lq      mean    median        uq       max neval
   f1 931928.5 596652.0 12690.309 31796.371 33326.273 232.24601   100
   f2 139656.2  98002.5  2770.152  5879.894  8057.321  84.39669   100
   f3      1.0      1.0     1.000     1.000     1.000   1.00000   100

Remark

From the benchmarking, it seems is.na is an expensive operator from the speed perspective, especially when the size of array scales up

9

Very late to the party but didn’t see anyone mention Position() that does what you are looking for and stops scanning as soon as a match is found:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>Position((x) !is.na(x), x) # 4
</code>
<code>Position((x) !is.na(x), x) # 4 </code>
Position((x) !is.na(x), x) # 4

0

on an Intel Mac, OS 13.5.2

Edit: retest with purely logical input

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code> x <- as.logical(c(NA,NA,NA,rep(1,times=20), rep(NA,times=5),rep(1,times=biglen)))
Rgames> microbenchmark(which_non_na(x),nonNA(x))
Unit: nanoseconds
expr min lq mean median uq max neval
which_non_na(x) 691 735.0 1192.88 898.0 1342.5 16088 100
nonNA(x) 45624 46285.5 47195.47 46569.5 46928.5 102274 100
</code>
<code> x <- as.logical(c(NA,NA,NA,rep(1,times=20), rep(NA,times=5),rep(1,times=biglen))) Rgames> microbenchmark(which_non_na(x),nonNA(x)) Unit: nanoseconds expr min lq mean median uq max neval which_non_na(x) 691 735.0 1192.88 898.0 1342.5 16088 100 nonNA(x) 45624 46285.5 47195.47 46569.5 46928.5 102274 100 </code>
 x <- as.logical(c(NA,NA,NA,rep(1,times=20), rep(NA,times=5),rep(1,times=biglen)))
Rgames> microbenchmark(which_non_na(x),nonNA(x))
Unit: nanoseconds
            expr   min      lq     mean  median      uq    max neval
 which_non_na(x)   691   735.0  1192.88   898.0  1342.5  16088   100
        nonNA(x) 45624 46285.5 47195.47 46569.5 46928.5 102274   100

Dunno how c++ would fare if it ran on numeric inputs but didn’t declare “logical.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>nonNA <- function(x) {
for(jn in 1:length(x)) {
if(!is.na(x[jn])) return( jn)
}
return('not found')
}
biglen=1e6
x <- c(NA,NA,NA,rep(1,times=20), rep(NA,times=5),rep(1,times=biglen))
microbenchmark(which_non_na(x),nonNA(x))
Unit: microseconds
expr min lq mean median uq max neval
which_non_na(x) 1432.809 1596.131 2398.95178 1606.1205 1615.342 11521.484 100
nonNA(x) 46.436 47.125 55.06929 47.8625 56.734 111.771 100
biglen=1e8
x <- c(NA,NA,NA,rep(1,times=20), rep(NA,times=5),rep(1,times=biglen))
microbenchmark(which_non_na(x),nonNA(x))
Unit: microseconds
expr min lq mean median uq max neval
which_non_na(x) 322243.196 327702.4310 349913.34584 353146.7080 370955.373 385839.443 100
nonNA(x) 46.642 48.3305 86.83075 100.3525 115.982 256.817 100
</code>
<code>nonNA <- function(x) { for(jn in 1:length(x)) { if(!is.na(x[jn])) return( jn) } return('not found') } biglen=1e6 x <- c(NA,NA,NA,rep(1,times=20), rep(NA,times=5),rep(1,times=biglen)) microbenchmark(which_non_na(x),nonNA(x)) Unit: microseconds expr min lq mean median uq max neval which_non_na(x) 1432.809 1596.131 2398.95178 1606.1205 1615.342 11521.484 100 nonNA(x) 46.436 47.125 55.06929 47.8625 56.734 111.771 100 biglen=1e8 x <- c(NA,NA,NA,rep(1,times=20), rep(NA,times=5),rep(1,times=biglen)) microbenchmark(which_non_na(x),nonNA(x)) Unit: microseconds expr min lq mean median uq max neval which_non_na(x) 322243.196 327702.4310 349913.34584 353146.7080 370955.373 385839.443 100 nonNA(x) 46.642 48.3305 86.83075 100.3525 115.982 256.817 100 </code>
nonNA <- function(x) {
    for(jn in 1:length(x)) {
        if(!is.na(x[jn])) return( jn)
    }
    return('not found')
}

biglen=1e6
x <- c(NA,NA,NA,rep(1,times=20), rep(NA,times=5),rep(1,times=biglen))

microbenchmark(which_non_na(x),nonNA(x))
Unit: microseconds
            expr      min       lq       mean    median       uq       max neval
 which_non_na(x) 1432.809 1596.131 2398.95178 1606.1205 1615.342 11521.484   100
        nonNA(x)   46.436   47.125   55.06929   47.8625   56.734   111.771   100

biglen=1e8
 x <- c(NA,NA,NA,rep(1,times=20), rep(NA,times=5),rep(1,times=biglen))
microbenchmark(which_non_na(x),nonNA(x))
Unit: microseconds
            expr        min          lq         mean      median         uq        max neval
 which_non_na(x) 322243.196 327702.4310 349913.34584 353146.7080 370955.373 385839.443   100
        nonNA(x)     46.642     48.3305     86.83075    100.3525    115.982    256.817   100

4

A little faster than using min and base R

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>which(!is.na(x))[1]
</code>
<code>which(!is.na(x))[1] </code>
which(!is.na(x))[1]

finds the first non-NA value from 100,000,000 values in less than a second, even on a crappy cpu.

7

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật