I have a list of my team members ~300 of us. There is also another list of invitees to a party ~4,000 persons. How do I find out which team members are being invited to the party?
Here’s a scaled down example. Note that the invitee names are in varying formats:
my_team_list = ['Andy', 'Bernice', 'Charlotte', 'David', 'Evan']
invitee_list = ['Mandy (Team A)', 'Navin - Team A', 'Olive Team B', 'Peter Team C', 'Queenie (D)', 'Royston -D team', 'Steven (E team)', 'Tammy (E team)', 'Bernice (Z team)', 'Victor (A Team)', 'Wendy (Team B)', 'David (Team Z)']
for name in my_team_list:
for invitee in invitee_list:
if name in invitee:
print(invitee)
#output
Bernice (Z team)
David (Team Z)
The above code is a brute force example of getting the solution, but the time complexity is O(n x m). I am wondering is there a more efficient way to get this done?
2
Looks like you only have first name in both the lists. So you could clean up the invitee_list
by splitting it by space and then do a set intersection.
my_team_list = ['Andy', 'Bernice', 'Charlotte', 'David', 'Evan']
invitee_list = ['Mandy (Team A)', 'Navin - Team A', 'Olive Team B', 'Peter Team C', 'Queenie (D)', 'Royston -D team', 'Steven (E team)', 'Tammy (E team)', 'Bernice (Z team)', 'Victor (A Team)', 'Wendy (Team B)', 'David (Team Z)']
for i in range(len(invitee_list)):
invitee_list[i] = invitee_list[i].split(" ")[0]
print(set(my_team_list)&set(invitee_list))
If your format is more complex than you’ve posted (e.g. there are names with spaces in them, or other funny characters), you might consider using a regular expression:
import re
my_team_list = ['Andy', 'Bernice', 'Charlotte', 'David', 'Evan']
invitee_list = ['Mandy (Team A)', 'Navin - Team A', 'Olive Team B', 'Peter Team C', 'Queenie (D)', 'Royston -D team', 'Steven (E team)', 'Tammy (E team)', 'Bernice (Z team)', 'Victor (A Team)', 'Wendy (Team B)', 'David (Team Z)']
my_team_re = re.compile("|".join(re.escape(name) for name in my_team_list))
for name in invitee_list:
if my_team_re.search(name):
print(name)
This is guaranteed to behave the same as your current solution, but should run substantially faster, as it will run in O(n+m) time.