I am searching through a web page against a list of names in a data frame. If a name appears in a paragraph I want to know which paragraph so I can parse certain parts of that paragraph and then associate it with the name.
I have two data frames:
dfrule which is made up of ‘Paragraphs’ and ‘Ids’ and eldf which is made up of ‘Names’ and ‘Ids’
So far I have:
substring_matches = eldf['name'].apply(lambda s1: dfrule['Paragraphs'].apply(lambda s2: s1 in s2).any()
matchdf = eldf[substring_matches]
This gives me every name on the list that matched against any paragraph but not the Id of which paragraph it matched against. How would I be able to associate it with the paragraph id?