Our data is like this: People were firstly assessed at baseline as being either state a
or b
. During the follow up we recorded again what state the people are in (again a
vs. b
). Here is some dummy data:
set.seed(1)
n <- 10000
states <- letters[1:2]
old <- factor(sample(states, n, replace= TRUE))
new <- ifelse(old == states[1], sample(states, sum(old == states[1]), replace= TRUE, prob= c(.75, .25)),
sample(states, sum(old == states[2]), replace= TRUE, prob= c(.5, .5)))
new <- factor(new, levels= c("censored", levels(old)))
time <- rnorm(n, 100, 10) # as time interval between baseline and follow-up
Our research question is with what probability people change states or remain in the same state over time. What firstly we did is:
survival::survfit(formula= Surv(time, new) ~ old) |>
ci_plot()
The problem with this plot is that it neglects the initial state of the people. In the left plot the red line shows how people change from the initial state (s0)
to a
or b
. But a
is the initial state! So what we did is that we added up the probabilities of the events (s0)
and a
to show the probability of staying in state a
over time. Same was done for b
. This gives the following plot:
survival::survfit(formula= Surv(time, new) ~ old) |>
ci_plot_summed()
That plot is exactly what we expect. But it feels a bit hacky to sum up probabilities of the fit. I realized that survival::survfit
has an istate
argument which is defined as “Each subject’s initial state can be specied by the istate argument.”. But setting the argument does not work as expected. How to use survival::survfit
properly so that (s0)
is understood as the actual initial state? What we tried is
survival::survfit(formula= Surv(time, new) ~ old, istate= old) |>
ci_plot()
survival::survfit(formula= Surv(time, new) ~ 1, istate= old) |>
ci_plot()
But this all returns wrong plots. The plot functions are defined as:
# Based on https://rdrr.io/cran/survminer/src/R/ggcompetingrisks.R
# Most important change is the use of geom_line instead of geom_area
ci_plot <- function(fit) {
psta <- as.data.frame(fit$pstate)
if (is.null(fit$strata)) {
psta$strata <- "all"
} else {
psta$strata <- rep(names(fit$strata), fit$strata)
}
psta$times <- fit$time
pstal <- tidyr::gather(psta, event, value, -strata, -times)
ggplot(pstal, aes(times, value, col=event)) +
geom_line() + facet_wrap(~strata)
}
ci_plot_summed <- function(fit) {
psta <- as.data.frame(fit$pstate)
psta$strata <- rep(names(fit$strata), fit$strata)
psta$times <- fit$time
pstal <- tidyr::gather(psta, event, value, -strata, -times)
pstal$value[pstal$strata == unique(pstal$strata)[1] & pstal$event == "a"] <- pstal$value[pstal$strata == unique(pstal$strata)[1] & pstal$event == "(s0)"] + pstal$value[pstal$strata == unique(pstal$strata)[1] & pstal$event == "a"]
pstal$value[pstal$strata == unique(pstal$strata)[2] & pstal$event == "b"] <- pstal$value[pstal$strata == unique(pstal$strata)[2] & pstal$event == "(s0)"] + pstal$value[pstal$strata == unique(pstal$strata)[2] & pstal$event == "b"]
pstal <- pstal[pstal$event != "(s0)", ]
ggplot(pstal, aes(times, value, col=event)) +
geom_line() + facet_wrap(~strata)
}