Thiết kế website giá rẻ

Question

I want to calculate estimated group mean scores in a 2×2 Gaussian regression after obtaining the regression coefficients. Here is toy data. 100 observations each of region – a and b – and sex – m and f. I have designed the scores so there is a 5-point difference on average between regions a and b but no difference between m and f.

set.seed(1234)

d <- data.frame(region = factor(rep(letters[1:2],each=100)),
                sex = factor(rep(c("m", "f"),times=100)),
                score = round(x = c(rnorm(100, mean = 5, sd = 1),
                                    rnorm(100, mean = 10, sd = 1)),
                              digits = 1))

Now I will use the model.matrix() function to obtain contrast coefficients for each observation, based on its group membership. I will use treatment coding, that is [0,1] with region a and sex m as the reference levels for each.

model.matrix(object = score ~ region*sex,
             data = d,
             contrasts.arg = list(region = contr.treatment(nlevels(d$region)),
                                  sex = contr.treatment(nlevels(d$region)))) -> cmTreat

Now we can use the model matrix directly in the regression using the lm() function. We specify 0 + terms because the model matrix already contains an intercept.

(lm(d$score ~ 0 + cmTreat) -> lmTreat)

# output
# Call:
#   lm(formula = d$score ~ 0 + cmTreat)
# 
# Coefficients:
# cmTreat(Intercept)       cmTreatregion2          cmTreatsex2  cmTreatregion2:sex2  
#              4.814                5.132                0.056                0.140

The regression has retrieved the main effects and interactions. But what if we want to get estimated marginal means, specifically the estimated mean in each ‘cell’ of the 2 x 2: region a – female, region a – male, region b – female, region b – male.

We can do this manually via the attributes of the model matrix.

treatCoefs <- coef(lmTreat) # assign the vector of coefficients a name

# mean in region a female: intercept[1] + region[0] + sex[0] + region[0]*sex[0]
regionA_f <- treatCoefs[1] + treatCoefs[2]*attr(cmTreat, which = "contrasts")$region[,1][1] + treatCoefs[3]*attr(cmTreat, which = "contrasts")$sex[,1][1] + treatCoefs[4]*attr(cmTreat, which = "contrasts")$region[,1][1]*attr(cmTreat, which = "contrasts")$sex[,1][1]

# mean in region a male: intercept[1] + region[0] + sex[1] + region[0]*sex[1]
regionA_m <- treatCoefs[1] + treatCoefs[2]*attr(cmTreat, which = "contrasts")$region[,1][1] + treatCoefs[3]*attr(cmTreat, which = "contrasts")$sex[,1][2] + treatCoefs[4]*attr(cmTreat, which = "contrasts")$region[,1][1]*attr(cmTreat, which = "contrasts")$sex[,1][2]

# mean in region b female: : intercept[1] + region[1] + sex[0] + region[1]*sex[0]
regionB_f <- treatCoefs[1] + treatCoefs[2]*attr(cmTreat, which = "contrasts")$region[,1][2] + treatCoefs[3]*attr(cmTreat, which = "contrasts")$sex[,1][1] + treatCoefs[4]*attr(cmTreat, which = "contrasts")$region[,1][2]*attr(cmTreat, which = "contrasts")$sex[,1][1]

# mean in group b male: intercept[1] + region[1] + sex[1] + region[1]*sex[1]
regionB_m <-treatCoefs[1] + treatCoefs[2]*attr(cmTreat, which = "contrasts")$region[,1][2] + treatCoefs[3]*attr(cmTreat, which = "contrasts")$sex[,1][2] + treatCoefs[4]*attr(cmTreat, which = "contrasts")$region[,1][2]*attr(cmTreat, which = "contrasts")$sex[,1][2]

Now if we compare the actual group means to the estimated means (apologies non dplyr people)…

(library(dplyr)
d %>%
  group_by(region, sex) %>%
    summarise(actualMean = mean(score)) %>%
      add_column(estMeans = c(regionA_f, regionA_m, regionB_f, regionB_m))

# # A tibble: 4 × 4
# # Groups:   region [2]
# region  sex    actualMean estMeans
# <fct>   <fct>        <dbl>    <dbl>
# 1 a      f           4.81     4.81
# 2 a      m           4.87     4.87
# 3 b      f           9.95     9.95
# 4 b      m           10.1     10.1

So this works great. “What is the problem?” I hear you ask. Well, you saw how much code was required to get the estimated means for each group. And I can do it. But I was wondering “Is there was an easier way to do this manually?”.

I know I can use Russ Lenth’s excellent emmeans package and do use that a lot, but I wanted to learn how to do it manually in a more elegant way. I know nothing of matrix algebra and not a lot about contrast matrices. I just can’t help feeling as if there is a better way (one whose method might adapt better across different designs and levels).

p.s. this question may have been better suited to cross validated but I thought I would try here first as it is just r-specific enough to warrant posting on SO.

Thiết kế website giá rẻ

Danh mục

Is there an easier way to manually calculate estimated group means using the model.matrix?