I am trying to simulate visit schedules for participants in a study. I will start with a dataframe of participants and first visit dates, and my goal is to create columns with the dates of later visits.
Participants will have study visits at certain intervals after their first visit. The intervals depends on the study arm. I have stored these intervals in vectors.
I want to end up with a dataframe where each row is a participant and there are date columns for each visit number. I am able to accomplish this with case_when, but I have many visits and arms (not shown in the reprex). The output from the example works fine, but I am looking for a more robust and parsimonious solution.
library(dplyr)
# set up schedules
arm1 <- c(0, 14, 28)
arm2 <- c(0, 14, 19, 180)
arm3 <- c(0, 14, 28, 32)
# simulate dataframe for 30 participants
d0 <- data.frame(
ids = seq(1:30),
studyarm = rep(c("arm1", "arm2", "arm3"), 10),
date_visit1 = rep(seq(as.Date("2024-06-01"), as.Date("2024-06-10"), by = 1)))
# add visit dates depending on arm
d1 <- d0 %>%
mutate(
date_visit2 = case_when(
studyarm == "arm1" ~ date_visit1 + arm1[2],
studyarm == "arm2" ~ date_visit1 + arm2[2],
studyarm == "arm3" ~ date_visit1 + arm3[2]
),
date_visit3 = case_when(
studyarm == "arm1" ~ date_visit1 + arm1[3],
studyarm == "arm2" ~ date_visit1 + arm2[3],
studyarm == "arm3" ~ date_visit1 + arm3[3]
),
date_visit4 = case_when(
studyarm == "arm1" ~ date_visit1 + arm1[4], # correctly returns NA
studyarm == "arm2" ~ date_visit1 + arm2[4],
studyarm == "arm3" ~ date_visit1 + arm3[4]
)
)