I have very unstructured data for following variables:
Host Home Industry Value Year value_lag
A X I 1 2001 NA
B X I 2 2001 NA
C X I 3 2003 NA
A X I 1 2003 NA
B X I 1 2004 4
A X I 3 2005 1
C X I 2 2005 1
For every case, I want to get the sum of deal value for provious year only. If there is no observation in previous year, then I want value to be NA. Last column is the desired output.
I am more familiar with data.table
library.
I used the following code to get the output. I am getting the right output from this loop. However, it is very slow.
library(data.table)
setDT(dealsf)
df <- dealsf
value_lag <- rep(NA, nrow(dealsf))
for(i in 1:nrow(dealsf)) {
host <- dealsf[i, Host]
acq_sector <- dealsf[i, Industry]
home <- dealsf[i, Home]
yr <- dealsf[i, year]
value_lag[i] <- dealsf[Home_i == home & Industry == acq_sector & year == yr - 1 & Host == host,
sum(Value, na.rm = T)]
if( i%%10 == 0) print(i)
}
5