I am building a Cox PH model using the survival package in R and would like to include a time-dependent coefficient for my categorical variable. Reproducible data set up:
library(survival)
# Data
stanford <- stanford2
stanford$age_cat <- ifelse(stanford$age > 35, "old", "young")
Working from the time-dependent vignette here for the survival package, I need to use the tt()
function. Attempt 1 revealed I needed dummy coding.
mod.fail <- coxph(Surv(time, status) ~ tt(age_cat),
data = stanford,
tt = function(x, t, ...) x*t)
Error in x * t : non-numeric argument to binary operator
So, add this indicator variable.
# Create dummy coding of age_cat
stanford$age_cat_d <- ifelse(stanford$age_cat == "old", 1, 0)
Now, I am confused how to properly specify the model. Both of the below will run, but I am not sure which provides the right solution to letting the effect of the age category vary over time.
# Model 1
mod.t1 <- coxph(Surv(time, status) ~ tt(age_cat_d),
data = stanford,
tt = function(x, t, ...) x*t)
# Model 2
mod.t2 <- coxph(Surv(time, status) ~ age_cat_d + tt(age_cat_d),
data = stanford,
tt = function(x, t, ...) x*t)
Below is how I would think we should estimate the effect of the age category at time=200 in each model, showing the models are different.
# Model 1
coef(mod.t1)[1]*200
tt(age_cat_d)
0.04425679
# Model 2
coef(mod.t2)[1]+coef(mod.t2)[2]*200
age_cat_d
0.5424105
So, are either of the above models the correct way to implement a time-dependent coefficient for the age category? The examples in the linked vignette (and other guides for using tt()
I’ve found) focus on time-dependent coefficients for continuous variables. (Note: The above example is just for reproducibility; I am not arguing we should create such a time-dependent model for the given data)
[1]: https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf