I have a data frame that looks like this:
# example data
data <- data.frame(ID = c("ab", "bc", "cd", "de", "ef"),
t1 = c(NA, "N", NA, "S", NA),
t2 = c("N", NA, NA, "A", NA),
t3 = c("N", "S", NA, NA, NA),
t4 = c("N", "S", "A", NA, NA))
Essentially, each row contains the states (represented by N, S, or A) that occurred for each ID at different time points (t1, t2, etc…).
I would like to 1. add “none” for NA’s that occur before the first state in each row (ex: ID AB at t1) and 2. repeat the state that occurs for each ID until a new state is reached.
# desired output
output.data <- data.frame(ID = c("ab", "bc", "cd", "de", "ef"),
t1 = c("none", "N", "N", "S", "S"),
t2 = c("N", "N", "N", "A", "A"),
t3 = c("N", "S", "S", "S", "S"),
t4 = c("N", "S", "A", "A", "A"))
I found this solution Replacing NAs with latest non-NA value which uses zoo::na.locf() to fill in NAs. This is great, but I am still missing part 1 of my question.
This post seems very promising Replace initial NA values with zero in a row until non NA column, but none of these solutions seem to work. They don’t throw an error, but they do not produce the desired output.
Clarkie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.