Thiết kế website giá rẻ

Question

I have some questions about how to get estimated marginal means from a bayesian multinomial logistic regression using the emmeans and brms packages. Data from my worked example comes from here. Apologies for calling the tidyverse suite but it’s really much easier than calling the individual packages

library(brms)
library(nnet)
library(emmeans)
library(foreign)
library(tidyverse)
library(magrittr)

The data set contains variables on 200 students. The outcome variable is prog, or program type, a three-level categorical variable (general, academic, vocation) indicating which program students enrolled in. The predictor variable is social economic status, ses, a three-level categorical variable, with low, middle, and high. To make this example simpler we will remove middle category so that this becomes a binary predictor, with levels low and high. We also need to make academic the reference level (just to make it consistent with the oarc web page)

# read in data
ml <- read.dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta")

# remove the 'middle' level from the ses predictor and make academic the reference levels for prog
ml %<>%
  select(id, ses, prog) %>%
    filter(ses != "middle") %>%
      mutate(ses = fct_drop(ses),
             prog = fct_relevel(prog, "academic"))

Let’s graph proportions in each program within each level of ses

ml %>%
  group_by(ses, prog) %>%
    summarise(count = n(),
              .groups = "drop_last") %>%
      mutate(tot = sum(count),
             perc = round(count/tot*100,1)) %>%
        ggplot(mapping = aes(y = perc,
                             x = prog,
                             fill = prog)) +
               geom_bar(stat = "identity") +
               facet_grid(.~ses) +
               theme(axis.title.x = element_blank(),
                     axis.text.x = element_blank(),
                     axis.ticks.x = element_blank())

Now let’s run the multinomial logistic regression using the multinom() function from the nnet package

test_nnet <- multinom(prog ~ ses,
                      data = ml)

summary(test_nnet)

# output
# Coefficients:
#          (Intercept)   seshigh
# general   -0.1718492 -1.368592
# vocation  -0.4595337 -1.332220

The results match what we can see with our own eyes from the graphs: The log odds of being in general program vs. in academic program will decrease by 1.37 if moving from ses=”low” to ses=”high” and the log odds of being in vocational program vs. in academic program will decrease by 1.33 if moving from ses=”low” to ses=”high”.

But what if we wanted some marginal effects. Say we wanted to know whether high ses people are more likely to enrol in an academic program than low ses people. We need some estimates of the effect of ses at each level of prog? Here we turn to the amazing emmeans package

# get the emmeans object
em_nnet <- emmeans(test_nnet,
                   specs = ~ses|prog)

# we want the output as log-odds so we need to re-grid
em_nnet_logit <- regrid(em_nnet,
                        transform = "logit") 

em_nnet_logit

# output
# prog = academic:
# ses    prob    SE df lower.CL upper.CL
# low  -0.388 0.297  4   -1.213    0.437
# high  0.965 0.294  4    0.149    1.781
# 
# prog = general:
# ses    prob    SE df lower.CL upper.CL
# low  -0.661 0.308  4   -1.516    0.193
# high -1.695 0.363  4   -2.701   -0.688
# 
# prog = vocation:
# ses    prob    SE df lower.CL upper.CL
# low  -1.070 0.335  4   -1.999   -0.142
# high -1.986 0.403  4   -3.105   -0.867

Now finally to get estimated differences in log-odds of enrolling in each program based on ses we use the emmeans::pairs function

(pairs_logit <- confint(pairs(em_nnet_logit, reverse = TRUE)))

# output
# prog = academic:
# contrast   estimate    SE df lower.CL upper.CL
# high - low    1.353 0.418  4    0.193    2.513
# 
# prog = general:
# contrast   estimate    SE df lower.CL upper.CL
# high - low   -1.033 0.476  4   -2.354    0.288
# 
# prog = vocation:
# contrast   estimate    SE df lower.CL upper.CL
# high - low   -0.915 0.524  4   -2.370    0.539
# 
# Results are given on the log odds ratio (not the response) scale. 
# Confidence level used: 0.95

As is very clear from the graph, the log-odds of people from a high-ses background enrolling in an academic course are much higher than people from a low-ses background.

So, that is the process in nnet. But I would like to do a Bayesian version of the analysis, using the `categorical’ family in the brms::brm() function.

test_brm <- brm(prog ~ ses,
                data = ml,
                family = "categorical",
                seed = 1234)

# warning, will take just under a minute to compile and run

Now if we compare coefficients from the brm() and nnet::multinom() models…

data.frame(coef = c("intercept_gen", "intercept_voc", "seshigh_gen", "seshigh_voc"),
           est_brm = round(as.data.frame(fixef(test_brm))$Estimate,2),
           est_nnet = round(c(as.data.frame(coef(test_nnet))$`(Intercept)`, as.data.frame(coef(test_nnet))$`seshigh`),2)) 

# output
#            coef est_brm est_nnet
# 1 intercept_gen   -0.15    -0.17
# 2 intercept_voc   -0.45    -0.46
# 3   seshigh_gen   -1.41    -1.37
# 4   seshigh_voc   -1.37    -1.33

…we get very comparable results

But the problem comes when we try to pass the brm() version into emmeans

emmeans(test_brm, specs = ~ses|prog)

It generates the following message

Error : emmeans is not yet supported for this brms model.
Error in (function (object, at, cov.reduce = mean, cov.keep = get_emm_option("cov.keep"),  : 
  Perhaps a 'data' or 'params' argument is needed

I tried doing the same thing with ref_grid() but got a similar message.

The best luck I had was splitting the analysis into two chunks

# estimated difference in log-odds of enrolling in general program based on ses
rg_brm_gen <- ref_grid(test_brm, dpar = "mugeneral")
em_gen <- emmeans(rg_brm_gen, specs = ~ses)
confint(pairs(em_gen, reverse = TRUE))

# output
# contrast   estimate lower.HPD upper.HPD
# high - low     -1.4     -2.46    -0.459
# 
# Point estimate displayed: median 
# HPD interval probability: 0.95 

# estimated difference in log-odds of enrolling in vocational program based on ses
rg_brm_voc <- ref_grid(test_brm, dpar = "muvocation")
em_voc <- emmeans(rg_brm_voc, specs = ~ses)
confint(pairs(em_voc, reverse = TRUE))

# output
# contrast   estimate lower.HPD upper.HPD
# high - low    -1.37     -2.49     -0.27
# 
# Point estimate displayed: median 
# HPD interval probability: 0.95

But there are a few major problems:

(1) I don’t know if I have done the brms model right. The error messages indicate that clearly this type of emmeans analysis is not yet supported the categorical family in brms (though many families are. For simple effects of ses at the level general the estimate from nnet and brms were -1.0 vs -1.4 and for vocational -0.9 vs -1.37. These don’t seem as similar as the default coefficients for the two models were, so I think I am not doing something right. With standard Gaussian or logistic regression I could calculate conditional effects from recombining various regression coefficients but here I am a bit lost. There doesn’t seem to be an intercept for academic at all that would make this possible.

(2) My dodgy, hacky way of using emmeans within brms is clearly missing any form of comparison at the academic level, something delivered automatically by the pairs function within emmeans via nnet.

Can anyone help me replicate my nnet analysis in brms, either code I am unaware of in emmeans or more manually via extracting the mcmc chains from the brms fit?

Thiết kế website giá rẻ

Danh mục

Calculating estimated marginal means using the emmeans package from a multinomial logistic regression model performed in brms