I work with longitudinal (repeated measures) models of clinical trial data. Patients are randomized to different treatment groups and measured over multiple pre-specified points in time. An example dataset is the FEV dataset from the mmrm
package, where ARMCD
denotes treatment groups and AVISIT
denotes discrete time points:
library(tidyverse)
data("fev_data", package = "mmrm")
data <- fev_data %>%
as_tibble() %>%
mutate(ARMCD = as.character(ARMCD)) %>%
select(-VISITN, -VISITN2)
data
#> # A tibble: 800 × 8
#> USUBJID AVISIT ARMCD RACE SEX FEV1_BL FEV1 WEIGHT
#> <fct> <fct> <chr> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 PT1 VIS1 TRT Black or African American Female 25.3 NA 0.677
#> 2 PT1 VIS2 TRT Black or African American Female 25.3 40.0 0.801
#> 3 PT1 VIS3 TRT Black or African American Female 25.3 NA 0.709
#> 4 PT1 VIS4 TRT Black or African American Female 25.3 20.5 0.809
#> 5 PT2 VIS1 PBO Asian Male 45.0 NA 0.465
#> 6 PT2 VIS2 PBO Asian Male 45.0 31.5 0.233
#> 7 PT2 VIS3 PBO Asian Male 45.0 36.9 0.360
#> 8 PT2 VIS4 PBO Asian Male 45.0 48.8 0.507
#> 9 PT3 VIS1 PBO Black or African American Female 43.5 NA 0.682
#> 10 PT3 VIS2 PBO Black or African American Female 43.5 36.0 0.892
#> # ℹ 790 more rows
#> # ℹ Use `print(n = ...)` to see more rows
It is common to use a so-called “constrained longitudinal data analysis” (cLDA) approach which pools all treatment groups at baseline. If time were continuous with baseline at AVISIT == 0
, it would be easy to enforce cLDA in the model formula:
formula <- FEV1 ~ AVISIT + AVISIT:ARMCD
However, this does not work with discrete time because the extra AVISITVIS1:ARMCDTRT
term distinguishes between baselines for PBO
and TRT
.
colnames(model.matrix(formula, data = data))
#> [1] "(Intercept)" "AVISITVIS2" "AVISITVIS3" "AVISITVIS4"
#> [5] "AVISITVIS1:ARMCDTRT" "AVISITVIS2:ARMCDTRT" "AVISITVIS3:ARMCDTRT" "AVISITVIS4:ARMCDTRT"
A workaround I have seen others use is to manually enforce cLDA in the data.
count(data, ARMCD, AVISIT)
#> # A tibble: 8 × 3
#> ARMCD AVISIT n
#> <chr> <fct> <int>
#> 1 PBO VIS1 105
#> 2 PBO VIS2 105
#> 3 PBO VIS3 105
#> 4 PBO VIS4 105
#> 5 TRT VIS1 95
#> 6 TRT VIS2 95
#> 7 TRT VIS3 95
#> 8 TRT VIS4 95
clda <- mutate(data, ARMCD = ifelse(AVISIT == "VIS1", "PBO", ARMCD))
count(clda, ARMCD, AVISIT)
#> # A tibble: 7 × 3
#> ARMCD AVISIT n
#> <chr> <fct> <int>
#> 1 PBO VIS1 200
#> 2 PBO VIS2 105
#> 3 PBO VIS3 105
#> 4 PBO VIS4 105
#> 5 TRT VIS2 95
#> 6 TRT VIS3 95
#> 7 TRT VIS4 95
But the terms in the model are all the same.
colnames(model.matrix(formula, data = clda))
#> [1] "(Intercept)" "AVISITVIS2" "AVISITVIS3" "AVISITVIS4"
#> [5] "AVISITVIS1:ARMCDTRT" "AVISITVIS2:ARMCDTRT" "AVISITVIS3:ARMCDTRT" "AVISITVIS4:ARMCDTRT"
Even worse, the model matrix is no longer full rank.
as.integer(Matrix::rankMatrix(model.matrix(formula, data = clda)))
#> [1] 7
Is there a way to code up the formula and/or contrasts to use cLDA without having to recode the data or model matrix?
It seems oddly low-level to have to manually generate a custom model matrix and then call lm()
or brms::brm()
on the hacked model matrix instead of the actual data. And it would prevent me from straightforwardly working with many of the nice post-processing tools that brms
provides.