I have a df like this one:
ID | FOLLOW-UP | RECURRENCE | RECURRENCE DATE |
---|---|---|---|
15 | 1 | no | |
15 | 2 | yes | 2003-05-15 |
16 | 1 | no | |
29 | 1 | yes | 2002-06-10 |
29 | 2 | yes | 2002-12-16 |
34 | 1 | yes | 2009-11-15 |
34 | 2 | no | |
34 | 3 | yes | 2012-05-03 |
36 | 1 | no | |
36 | 2 | no |
With in the first column the patient identifier (ID), in the 2nd the follow-up number, in the 3rd column the occurrence of a recurrence (yes/no) and in the last column the date of recurrence (if recurrence = yes).
There are therefore repeated lines for several patients, if they have several follow-ups.
I’d like to obtain a df with only 1 line per patient.
- If the patient has had several recurrences, I’d like to keep the line with the oldest recurrence (date of first recurrence).
- If the patient has had several follow-ups with recurrence and without recurrence, I’d like to keep the line with the (oldest) recurrence.
- If the patient has several follow-ups without recurrence, I’d like to keep follow-up 1.
In the end, I’d like to get this df here:
ID | FOLLOW_UP | RECURRENCE | RECURRENCE_DATE |
---|---|---|---|
15 | 2 | yes | 2003-05-15 |
16 | 1 | no | |
29 | 1 | yes | 2002-06-10 |
34 | 1 | yes | 2009-11-15 |
36 | 1 | no |
Here’s a reproducible example :
df <- data.frame(ID = c("15", "15", "16","29", "29", "34","34", "34", "36","36"),
FOLLOW_UP = c("1", "2", "1","1", "2", "1","2", "3", "1","2"),
RECURRENCE = c("no", "yes", "no","yes", "yes", "yes","no", "yes", "no","no"),
RECURRENCE_DATE = c("", "2003-05-15", "","2002-06-10", "2002-12-16", "2009-11-15","", "2012-05-03", "","")
)
I thought of using the “group by” and “slice” functions but I don’t know how to apply them with conditions.
Thanks in advance