I have pooled cross section data (df
)of different states at different points in time. I’m trying to see how weather events during crops’ growing period affected how the crops are currently priced. To do so, I would like to lag my precipitation
variable by 1 month, 2 months, and 3 months and add these new variables to df
. My current df
looks like this:
Date |
State |
Price |
Precipitation |
---|---|---|---|
05/27/21 | MA | 1.30 | 0.5 |
05/13/21 | MA | 1.28 | 1.7 |
06/10/21 | NH | 1.40 | 1.5 |
01/15/22 | NY | 3 | 2 |
I have complete time series data on precipitation for each state (precip
) that was used in df
to match the precipitation with the given dates. I would like to lag the dates in df
by 1, 2, 3,…, and 6 months, then look back to the precip
dataset and match the current date with the precipitation from the month(s) prior. This might be worded poorly it would look something like this:
Date | State | Price | Precipitation | precip -1 | precip -2 | … |
---|---|---|---|---|---|---|
05/27/21 | MA | 1.30 | 0.5 | precip in MA on 04/27/21 | precip on 03/27/21 | |
05/13/21 | MA | 1.28 | 1.7 | precip in MA on 04/13/21 | precip on 03/13/21 | |
06/10/21 | NH | 1.40 | 1.5 | precip in NH on 05/10/21 | precip on 04/10/21 | |
01/15/22 | NY | 3 | 2 | precip in NY on 12/15/21 | precip on 11/15/21 |
I tried df$precip_1<- lag(as.xts(df$precipitaion),k=1)
, but don’t actually think this is correct since it’s not time-series data. I was thinking of creating new variables with the lagged dates and matching those up to precip
directly, then adding it to df
. I’ve never worked with lags before so any help would be appreciated!