I am trying to apply multiple linear regression between 4 predictors and 1 response variable using R. All these variables exist in separate .csv files , as there are more than 1000 data points in each file. I am trying to merge all files based on a common key column “Year”.
Here ‘Temp’, ‘Ppt’, ‘ET’, ‘WS’ are predictor variables, and ‘SM’ is a response variable.
my dataset looks like this:
Temperature data:
| Year | ST1 |
|:—- |:——:|
| 1991 | 16.4 |
|1992|17|
|1993|16.9|
|1994|18.9|
|1995|17.2|
|1996|18.2|
Ppt data:
| Year | ST1 |
|:—- |:——:|
| 1991 | 2 |
|1992|5|
|1993|2|
|1994|4|
|1995|8|
|1996|7|
ET data:
| Year | ST1 |
|:—- |:——:|
| 1991 | 3 |
|1992|4.5|
|1993|6.4|
|1994|4.5|
|1995|5.3|
|1996|18.2|
WS data
| Year | ST1 |
|:—- |:——:|
| 1991 | 3 |
|1992|3.4|
|1993|4.2|
|1994|4.5|
|1995|3.4|
|1996|5.3|
SM data (response variable)
| Year | ST1 |
|:—- |:——:|
| 1991 | 10.2 |
|1992|14.5|
|1993|16.4|
|1994|14.5|
|1995|15.3|
|1996|12.3|
I am trying following script:
setwd('D:/MLR/NEW')
df <- read.csv("SM.csv")
df1<- read.csv("TEMP.csv")
df2<- read.csv("PPT.csv")
df3<- read.csv("WS.csv")
df4<- read.csv("ET.csv")
response_variable <- "df$st1"
predictor_variables <- c("df1$st1", "df2$st1","df3$st1","df4$st1")
formula <- as.formula(paste(response_variable, "~", paste(predictor_variables, collapse = "+")))
merged_d <- merge(df1, df2, df3, df4, by = "Year") ##Error occur in this line of code, it doesn't work for more than two files ###
merged <- merge(merged_d, df, by = "Year")
model9 <- lm(formula, data = merged)
I am trying to figure out to apply it only multiple files, on multiple grid points at once and want to pull out coefficients for each grid point.
Seeking for any any guidance or help.
Many thanks in advance.