I am trying to learn how to apply tmerge
to split time in order to create time-varying covariates for survival analyses.
I think I get the general idea (mostly) of how to use it, but I am getting stuck on how to correctly generate indicator variables for relevant time periods.
I have simulated some data for the case of one tvc with a start and end time and I am able to get a correct indicator out of this (perhaps by luck).
But when I simulate two tvc’s with start and end times that potentially overlap, I am then getting lost.
Can anyone help?
library(survival)
library(dplyr)
# Simulate data with 1 TVC
set.seed(1324)
dat <- data.frame(cbind(id = seq(1:10),
treat1_start = round(runif(10, 10, 50)),
treat1_end = round(runif(10, 50, 100)),
obs_time = round(runif(10, 100, 200)),
event = rbinom(10, 1, 0.5)))
# Insert some missingness (i.e. some obs with no treatment)
dat$treat1_start[4] <- dat$treat1_end[4] <- dat$treat1_start[7] <- dat$treat1_end[7] <- NA
# Create dataframe in counting process format
# This first step splits time at the start of treatment and also creates the event variable
dat_cp <- tmerge(data1 = dat,
data2 = dat |> select(id, obs_time, event, treat1_start),
id = id,
event = event(obs_time, event),
treat1_start_period = tdc(treat1_start))
# This second step splits time at the end of treatment and creates a correct treatment indicator (treat1_end_period)
dat_cp <- tmerge(data1 = dat_cp,
data2 = dat_cp |> select(id, treat1_end),
id = id,
treat1_end_period = event(treat1_end))
# treat1_end_period gives the correct treatment indicator in this case
# Simulate data with 2 TVC's
set.seed(1324)
dat <- data.frame(cbind(id = seq(1:10),
treat1_start = round(runif(10, 10, 50)),
treat1_end = round(runif(10, 50, 100)),
treat2_start = round(runif(10, 10, 50)),
treat2_end = round(runif(10, 50, 100)),
obs_time = round(runif(10, 100, 200)),
event = rbinom(10, 1, 0.5)))
# Insert some missingness (i.e. some obs with no treatment)
dat$treat1_start[4] <- dat$treat1_end[4] <- dat$treat1_start[7] <- dat$treat1_end[7] <- NA
# Create dataframe in counting process format
# This first step splits time at the start of treatment and also creates the event variable
dat_cp <- tmerge(data1 = dat,
data2 = dat |> select(id, obs_time, event, treat1_start, treat2_start),
id = id,
event = event(obs_time, event),
treat1_start_period = tdc(treat1_start),
treat2_start_period = tdc(treat2_start))
# This second step splits time at the end of treatment
dat_cp <- tmerge(data1 = dat_cp,
data2 = dat_cp |> select(id, treat1_end, treat2_end),
id = id,
treat1_end_period = event(treat1_end),
treat2_end_period = event(treat2_end))
# BUT HOW TO CREATE A CORRECT TREATMENT INDICATOR FOR EACH TREATMENT PERIOD