I have a vector with many (let’s say millions of) values between 0 and K, and a data frame of non-overlapping intervals (‘start’ and ‘end’). What’s the most time- and memory-efficient way to map every value to an interval? A straightforward approach is of course to use sapply, but this is very time consuming.
I figured I could perhaps more cleverly use matrix operations to do this faster, e.g.:
lessThanEnd <= values %*% t(1/intervals$end) < 1
greaterThanStart <= values %*% t(1/intervals$start) > 1
withinInterval <- (lessThanEnd)*(greaterThanStart)
To get a matrix indicating for each point whether it’s within each interval. But with a large number of points and intervals these matrices get very large or even exceed the memory limit.
Is there a faster, less memory-taxing way to do this that’s not occurring to me? It’s possible a package like GenomicRanges already has a function to do this that I’m not aware of? Thanks.