Consider this toy dataset, simulated with crossed random effets, and then a mixed model fitted to it with lme4::lmer
:
set.seed(123)
# Number of subjects and items
num_subjects <- 30
num_items <- 20
# Generate subject and item IDs
subject <- factor(rep(1:num_subjects, each=num_items))
item <- factor(rep(1:num_items, num_subjects))
# Generate fixed effect predictor
x <- rnorm(num_subjects * num_items, mean=0, sd=1)
# Generate random intercepts for subjects and items
subject_intercepts <- rnorm(num_subjects, mean=0, sd=2)
item_intercepts <- rnorm(num_items, mean=0, sd=1)
# Create a data frame to hold the data
dt <- data.frame(subject, item, x)
# Add random intercepts to the data frame
dt$subject_intercept <- subject_intercepts[dt$subject]
dt$item_intercept <- item_intercepts[dt$item]
# Generate residual error
residual_error <- rnorm(num_subjects * num_items, mean=0, sd=1)
# Generate response variable y
dt$y <- 5 + 3 * dt$x + dt$subject_intercept + dt$item_intercept + residual_error
# Fit the linear mixed-effects model
model <- lmer(y ~ x + (1|subject) + (1|item), data=dt)
# Summarize the model
summary(model)
dt$subject_intercept <- NULL
dt$item_intercept <- NULL
write.csv(dt, "crossed_re_01.csv")
The salient part of the summary output is:
Random effects:
Groups Name Variance Std.Dev.
subject (Intercept) 4.818 2.195
item (Intercept) 0.778 0.882
Residual 1.030 1.015
Now we use the same data in Python to fit (what we hope will be) the same model using statsmodels
:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.regression.mixed_linear_model import MixedLM
data = pd.read_csv("crossed_re_01.csv")
# Fit the linear mixed-effects model with random intercepts for both subject and item
model = MixedLM.from_formula('y ~ x', data, re_formula='1', groups=data['subject'],
vc_formula={'item': '0 + C(item)'})
result = model.fit(reml=True, method='lbfgs')
# Print the summary of the model
print(result.summary())
# Extracting variance components
subject_var = result.cov_re.iloc[0, 0] # Variance of subject random effect
item_var = result.vcomp[0] # Variance of item random effect
residual_var = result.scale # Residual variance
print("nRandom Effects Variance Components:")
print(f"Subject Variance: {subject_var}")
print(f"Item Variance: {item_var}")
print("nResidual Variance:")
print(f"Residual Variance: {residual_var}")
And the relevant output is:
---------------------------------------------------------
Intercept 4.526 0.403 11.225 0.000 3.736 5.316
x 2.943 0.058 50.820 0.000 2.829 3.056
Group Var 4.786
item Var 0.443
========================================================
Random Effects Variance Components:
Subject Variance: 4.786
Item Variance: 0.442
Residual Variance:
Residual Variance: 1.363
I was expecting fairly similar results for the variance components. The Subject
random intercepts agree quite well, but the item
and residual variance are markedly different.
My next step is going to be a Monte-Carlo simulation, but before cracking on with that I want to make sure that I’m doing the right things in Python ?