I’m looking to filter the data in one dataframe (Target
) based of whether or not a value in another dataframe (Reference
) falls between two dates in the Target
dataframe.
For example, this would be my raw data (Target
):
ParticipantId | Date1 | Date2 |
---|---|---|
10001 | 1/02/2010 | 1/02/2015 |
10001 | 3/02/2016 | 1/02/2018 |
10001 | 1/02/2019 | 1/02/2020 |
10001 | 1/02/2021 | 1/02/2023 |
10002 | 1/02/2016 | 1/02/2018 |
10002 | 1/02/2019 | 1/02/2020 |
10002 | 1/02/2021 | 1/02/2023 |
10003 | 1/02/2013 | 1/02/2020 |
10003 | 1/02/2021 | 1/02/2023 |
And this would be my reference data (Reference
):
ParticipantId | DateA |
---|---|
10001 | 3/12/2013 |
10002 | 5/15/2022 |
10003 | 9/20/2022 |
What I would like is a filtered output of of Target
where DateA
in Reference
falls between Date1
and Date2
in Target
like so:
ParticipantId | Date1 | Date2 |
---|---|---|
10001 | 1/02/2010 | 1/02/2015 |
10002 | 1/02/2021 | 1/02/2023 |
10003 | 1/02/2021 | 1/02/2023 |
If anyone would be able to provide some input as to how this could be done using dplyr
with pipes, I would greatly appreciate it.
Code to make the dataframes may be found below but you may need to load the lubridate
library.
Reference <- structure(
list(
ParticipantId = 10001:10003,
DateA = c("3/12/2013", "5/15/2022", "9/20/2022")
),
class = "data.frame",
row.names = c(NA, -3L)
)
Target <- structure(
list(
ParticipantId = c(
10001L,
10001L,
10001L,
10001L,
10002L,
10002L,
10002L,
10003L,
10003L
),
Date1 = c(
"1/2/2010",
"1/2/2016",
"1/2/2019",
"1/2/2021",
"1/2/2016",
"1/2/2019",
"1/2/2021",
"1/2/2019",
"1/2/2021"
),
Date2 = c(
"1/2/2015",
"1/2/2018",
"1/2/2020",
"1/2/2023",
"1/2/2018",
"1/2/2020",
"1/2/2023",
"1/2/2020",
"1/2/2023"
)
),
class = "data.frame",