Here is how I defined my condition , but I need the row just above as well. There are many observations per ID.
data2<- data2 %>%
group_by (ID_number) %>%
filter(time_diff_hour > 8.000 | is.na(time_diff_hour))
Asma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
4
You can add filter conditions with the lead
function, which will return a row above the tested row. In opposite, if you need to return a row below the tested row, you can use the lag
function.
data2<- data2 %>%
group_by (ID_number) %>%
filter(time_diff_hour > 8.000 | is.na(time_diff_hour) |
lead(time_diff_hour) > 8.000 | is.na(lead(time_diff_hour)))
Here is a solution in base R:
# Generate a sample data frame
data2 <- data.frame(ID_number = rep(factor(sample(1000:9999, 10, replace=TRUE)), 4),
time_diff_hour = sample(c(NA, 4:12), 40, replace=TRUE))
# Find indices matching the criteria
i <- which(data2$time_diff_hour > 8 | is.na(data2$time_diff_hour)) # initial matches
i <- sort(unique(c(i, i-1))) # combine initial matches with previous rows
i <- i[i %in% seq_along(data2$time_diff_hour)] # to ensure only valid indices are used (i.e. for case where a match is generated in row #1)
data2[i, ]
I realise this is quite clunky (if this can be simplified/shortened, please let me know!) – one of the reasons why we have dplyr/tidyverse…