I am confused with doing multiple linear regression with a factor in R. A is the binary variable, coded as 1 and -1 as a factor in lm. So, when making a prediction, how do we treat A=-1 with the coefficient, which is for A=1? I tried to figure out the fitted values for the first five obs, but it seems predict and my calculation don’t result the same. What mistake did I make here?
set.seed(1)
expit <- function(x) 1 / (1 + exp(-x))
n <- 10000
X1 <- rnorm(n)
A <- 2*rbinom(n, 1, expit(X1))-1
X2 <- rnorm(n)
Y <- X1 + X2 + A*X1 + rnorm(n)
mydata <- data.frame(X1, A, X2, Y)
mydata$A <- as.factor(mydata$A)
mod <- lm(Y ~ X1 + X2 + as.factor(A) + as.factor(A):X1)
predict(mod, newdata = mydata[1:5,])
-0.2107963 0.2664868 -0.4601201 2.5115522 -0.1143373
coef(mod2)[1] + coef(mod2)[2]*mydata[1:5,"X1"] + coef(mod2)[3]*mydata[1:5,"X2"] + coef(mod2)[4]*as.integer(mydata[1:5,"A"]) + coef(mod2)[5]*as.integer(mydata[1:5,"A"])*mydata[1:5,"X1"]
-1.4523706 0.6258031 -2.1150596 5.6605056 0.5332321