I have a dataset that looks like this.
# create data frame example
df1 <- data.frame("ckdate" = c("2016-04-28 22:30:00", "2016-05-14 12:00:00", "
2016-05-15 02:00:00", "2016-05-16 00:00:00", "2016-05-17 03:30:00"),
"lab_value" = c(12, 35, 9, 55, 90),
"pid" = c(1, 1, 2, 2, 2)
)
The goal here is to normailze the date times into 12 hour blocks and then calculate the average “lab_value” within those blocks across patients. Ultimately, so I can make a plot that shows the average lab value at each 12 hour time point. Here is what I have coded up so far
normalize_dates <- function(df){
df <- df %>%
mutate(ckdate = as.POSIXct(ckdate, format = "%Y-%m-%d %H:%M:%S")) %>%
mutate(hours_since_first = as.numeric(difftime(ckdate, min(ckdate), units = "hours"))) %>%
mutate(hours_since_first = hours_since_first %/% 12)
return(df)
}
df2 <- normalize_dates(df1) # This returns NAs for all differences
There are a number of missing (NA values). So the first thing I am wondering is if difftime
can handle NA, and if so is there and na.omit
option? Or is there a different way to calculate these differences?
Then for calculating the means I have
# finally, for each 12 hour block, take the average lab_value accross pids
df1 %>%
group_by(hours_since_first) %>%
summarize(mean_lab_value = mean(lab_value))
But I can’t really test this yet because I can’t get the 12 hour blocks. I work a lot in R, but have very little experience working with date times, so any help here would be greatly appreciated.