I have a complex filter I am trying to implement but not sure how to make it work.
I have two datasets:
data_1 <- data.frame(
ID = c(1, 1, 1, 1, 2, 2, 3, 3, 4, 4),
speciale = c(80003, 80003, 80003, 80004, 80004, 80005, 80005, 80006, 80005, 80003),
week = c("0109", "0110", "0111", "0212", "0209", "0209", "0309", "0309", "0310", "0311")
)
data_2 <- data.frame(
ID = c(1, 1, 1, 1, 1, 1, 2, 2, 3, 3, 4, 4),
speciale = c(80001, 80001, 80001, 80001, 80001, 80001, 80001, 80001, 80001, 80001, 80001,
80001),
week = c("0109", "0109", "0110", "0111", "0212", "0212", "0209", "0209", "0309", "0309",
"0310", "0311")
)
data_1 dataset has some ids, which have some specific identified codes in the variable “speciale” which are associated with the “speciale” codes in data_2 dataset.
What I would like to achieve is to exclude any observations from data_2 that matches the ID and week variable from data_1. That means that I want to associate each row from data_1 with only one row in data_2 and remove only one of the observations in data_2 that match ID and week number of data_1 dataset.
For example for ID = 1 in data_1, data_2 has two exact observations for week “0109” and for the week “0209” and I want to keep only 1 of those records and remove only one of them. (In my actual dataset data_2 might have 3 observations in total that match the one observation in data_1 and I still want to keep the two observations in data_2 and only remove 1 from it.)
On the contrary, for ID = 2, ID = 3 and ID = 4, in data_1, both of the associated observations from data_2 should be removed because each of these two observations in data_1 have two observations in data_2 dataset that match.
I do not want to use distinct() on data_2 because my real dataset is more complex than this small example.
Thanks.