I’m currently working with some phones numbers DataFrames and I need to merge them in a substring criteria way and has to be an left merge (meaning to have all combinations in case there is more than 1 coincidence).
For example these 2 DataFrames:
(ID’s here are for illustration)
PHONES
| id | phone |
| -- | ------------ |
| 1 | +12399998888 |
| 2 | +12377776666 |
| 3 | +1233399998888 |
PHONES_NO_EXT
| id | phone |
| -- | -------- |
| 1 | 99998888 |
| 2 | 99998888 |
| 3 | 77776666 |
RESULT_NEEDED
| id | phone | phone_no_ext |
| -- | ------------ | ------------- |
| 1 | +12399998888 | 99998888 |
| 1 | +12399998888 | 99998888 |
| 2 | +12377776666 | 77776666 |
| 3 | +1233399998888 | 99998888 |
| 3 | +1233399998888 | 99998888 |
Note you have always at least 1 coincidence from PHONES_NO_EXT that belongs to PHONES.
Take a look on ID 3 of PHONES, it’s impossible to know the prefix, but still need that coincidence.
One approach was doing all combinations (creating an len(PHONES) * len(PHONES_NO_EXT)
rows new DataFrame) and then making an df.apply
to filter based on substring (you can imagine this solution cannot be scalable).
How to do this merge in the most efficient way possible ?
Kadooken is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.