In R, I have two dfs, df1 and df2. Both of them have two cols, an ID col and an email col. The email cols are both list cols where each row is a list of emails, or you can say an ID is associated with more than one emails. My goal is to match ID1 from df1 to ID2 from df2 through matched email, so if any of of email is email1 matched with any of the email in email2 doesn’t matter which row it’s from, I can say according to the match’s ID1:1234 is equal to ID2:6653. My my final data frame would have at least ID1, ID2, matched_Email…
My main concern is that my df1 have ~300k rows and my df2 have ~3M rows. They are too large that I’m not sure how to efficiently build the dataset I want.
Would appreciate any help, thank you!!
I currently have this function in my code, but it’s been like 2 hours and R is still loading…
email_match <- function(list1, list2) { any(sapply(list1, function(x) any(x %in% list2))) }
Maggie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.