Thiết kế website giá rẻ

Question

I have a hypothetical dataset with 4 mutually exclusive treatments and a continuous variable, as well as a response variable. Each treatment is coded as a dummy variable and is either false or true.

I am trying to fit a linear regression model with an intercept fixed at zero (y ~ 0 + treat1 + treat2 + treat3 + treat4 + contvar) using the lm() function in R. I want to get coefficient estimates for all four treatments, as well as the continuous variable.

Here’s a reproducible example of my issue:

# Sample size
n <- 100

# Generate predictors, treatments 1 - 4 that are mutually exclusive, and a continuous variable
data <- data.frame(
  treat1 = c(rep(TRUE, 25), rep(FALSE, 75)),
  treat2 = c(rep(FALSE, 25), rep(TRUE, 25), rep(FALSE, 50)),
  treat3 = c(rep(FALSE, 50), rep(TRUE, 25), rep(FALSE, 25)),
  treat4 = c(rep(TRUE, 25), rep(FALSE, 75)),
  contvar = sample(0:100, n)/100
)

# Define means for each treatment
mean1 <- rnorm(100, -0.5, 0.1) ; mean2 <- rnorm(100, -0.2, 0.1) ; mean3 <- rnorm(100, 0.2, 0.1) ; mean4 <- rnorm(100, 0.5, 0.1)

# Generate response variable y based on the treatment means and the value of the continuous variable
data$y <- mean1 * data$treat1 + mean2 * data$treat2 + mean3 * data$treat3 + mean4 * data$treat4
data$y <- data$y * data$contvar

# Fit a no-intercept model
model0 <- lm(y ~ 0 + treat1 + treat2 + treat3 + treat4 + contvar, data = data)

# Summarize the no-intercept model
summary(model0)

The generated outcome reads:

Call:
lm(formula = y ~ 0 + treat1 + treat2 + treat3 + treat4 + contvar, 
    data = data)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.258377 -0.035568  0.007104  0.037690  0.203017 

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)    
treat1FALSE  0.01859    0.01803   1.031    0.305    
treat1TRUE  -0.02145    0.01881  -1.140    0.257    
treat2TRUE  -0.09835    0.02006  -4.904 3.88e-06 ***
treat3TRUE   0.09957    0.01985   5.017 2.44e-06 ***
treat4TRUE        NA         NA      NA       NA    
contvar     -0.04053    0.02468  -1.642    0.104    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.07014 on 95 degrees of freedom
Multiple R-squared:  0.5521,    Adjusted R-squared:  0.5285 
F-statistic: 23.42 on 5 and 95 DF,  p-value: 2.917e-15

There is a coefficient for treat1FALSE (unexpected) and for treat1TRUE, but no coefficient for treat4TRUE (unexpected). I understand that getting coefficient estimates for all treatments would not work with a random intercept model, but with a zero-intercept model I expected it would.

Just to illustrate, this is what happens if I fit the model with a random intercept:

Call:
lm(formula = y ~ treat1 + treat2 + treat3 + treat4 + contvar, 
    data = data)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.258377 -0.035568  0.007104  0.037690  0.203017 

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.01859    0.01803   1.031   0.3049    
treat1TRUE  -0.04004    0.01987  -2.015   0.0468 *  
treat2TRUE  -0.09835    0.02006  -4.904 3.88e-06 ***
treat3TRUE   0.09957    0.01985   5.017 2.44e-06 ***
treat4TRUE        NA         NA      NA       NA    
contvar     -0.04053    0.02468  -1.642   0.1039    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.07014 on 95 degrees of freedom
Multiple R-squared:  0.5466,    Adjusted R-squared:  0.5275 
F-statistic: 28.63 on 4 and 95 DF,  p-value: 1.296e-15

The coefficient of treat1FALSE in the no-intercept model equals the coefficient of the intercept in the regular model. How should I interpret this?

Finally, if I fit a linear model manually, by building a no-intercept linear model function, and then minimising the residual sum of squares with optim, I do get coefficients for all treatments and for the continuous variable:

lm_manual <- function(b){
  b1 <- b[1];  b2 <- b[2];  b3 <- b[3];  b4 <- b[4]; bcont <- b[5]
  y <- 0 + b1*data$treat1 + b2*data$treat2 + b3*data$treat3 + b4*data$treat4 + bcont*data$contvar
  SSR <- sum((y - data$y)^2)
}

out <- optim(par = rep(0, 5), fn = lm_manual)
(out$par) # coefficients for treatment 1, treatment 2, treatment 3, treatment 4, and the continuous variable

This reads: -0.39267220 -0.08900373 0.11117684 0.36312218 -0.02455234 which is what I was expecting to get as an answer from the lm(formula = y ~ 0 + treat1 + treat2 + treat3 + treat4 + contvar, data = data) model as well.

How can I get such coefficient estimates using lm?

Thiết kế website giá rẻ

Danh mục

How to get coefficient values for all treatments in a zero-intercept lm model