I need to apply an optimization function to millions of lines of data that returns a number based on two input variables. Applying the function using mapply takes forever. Since there are a limited range of values in the data with lots of repeats, I figured it would be faster to make a lookup table for the entire range of possible input combinations, and then just use a join from dplyr to fill in the data. My attempts have failed because of some strange behavior in my lookup table (actually a data frame). Below is a repeatable example. I’m trying to understand why I cannot reference some values in the lookup table, but others work OK.
#build a lookup table with three columns
lookup = transform(expand.grid(0:180, seq(0,35, by=0.1)), v=0)
colnames(lookup) <- c("d", "s", "v")
lookup$v = 1:nrow(lookup)
#generate some fake data with random values
data = data.frame(d = sample(lookup$d, 1000, replace = T),
s = round(runif(n=1000, min=0, max=35), 1))
#left join gives a lot of NAs
joined <- left_join(x=data, y=lookup, by=c('d','s'))
joined[1:20,]
# d s v
# 1 179 4.6 NA
# 2 45 12.9 23395
# 3 162 28.1 51024
# 4 25 11.2 NA
# 5 65 14.4 26130
# 6 64 32.5 58890
# 7 95 9.3 16929
# 8 70 31.6 57267
# 9 92 26.4 NA
# 10 129 12.0 21850
# 11 7 11.0 19918
# 12 59 34.6 62686
# 13 3 8.1 14665
# 14 146 3.0 5577
# 15 16 26.8 48525
# 16 15 27.5 49791
# 17 136 6.2 11359
# 18 70 6.5 11836
# 19 103 0.4 828
# 20 99 22.6 41006
#unable to reference certain values...
lookup$v[which(lookup$d == 177 & lookup$s == 8.2)]
#integer(0)
#...even though they exist
lookup[15020,]
# d s v
#15020 177 8.2 15020
#other values are OK
lookup$v[which(lookup$d == 177 & lookup$s == 8.3)]
#[1] 15201