I’m running a study where we’re matching a group of participants with each other. Each participants has a given ID, a work position, a work area, and a text they’ve submitted.
The idea is to give each participant six matches. The first three matches are other participants that will give something to the first participant. The first match is from the same work area, whereas the two other matches are from different work areas. The last three matches (fourth, fifth, and sixth) are other participants that will receive something from the first participant. The fourth match is from the same work area, whereas the two other matches are from different work areas.
Now, I’ve managed to figure out a solution to matching the first three matches using Google Sheets and Apps Script. The restrictions were as described, and also that participants could not be used as a match more than once for the first match, and not more than twice in total across the second and third match.
Some cells are empty, because some researchers were not matched.
Matches four, five, and six should follow from the first three matches. For example, if researcher A has researcher B as their first match, this means that researcher B should have researcher A as their fourth match.
I’ve made some attempts with Excel and with ChatGPT’s Python coding. I was able to successfully use the first matches to reverse-match the fourth matches. However, when trying this for the fifth and sixth match, I was unable to reach the same number of matches as for the second and third matches.
I would appreciate any help! I’ve attached the GPT code with Python below. I’ve also attached a table with the columns I’m using in Excel (going onwards from the end of Match1).
Thanks in advance!
| ID | Position || Area | Text || Match1_ID | Match1_Position || Match1_Area | Match1_Text |
import pandas as pd
# Load the new dataset
file_path_match2_match3 = '/mnt/data/Contact list - first to second phase - manual.xlsx'
df_match2_match3 = pd.read_excel(file_path_match2_match3)
# Ensure the DataFrame has the necessary Match5 and Match6 columns
columns_to_add = ['Match5_PID', 'Match5_Position', 'Match5_PrimaryArea', 'Match5_FeedbackText',
'Match6_PID', 'Match6_Position', 'Match6_PrimaryArea', 'Match6_FeedbackText']
for col in columns_to_add:
if col not in df_match2_match3.columns:
df_match2_match3[col] = None
# Create dictionaries to track matches
match2_dict = {}
match3_dict = {}
# Populate the dictionaries with reciprocal matches for Match5 and Match6
for index, row in df_match2_match3.iterrows():
researcher_id = row['PROLIFIC_ID']
match2_pid = row['Match2_PID']
match3_pid = row['Match3_PID']
if pd.notna(match2_pid):
if match2_pid not in match2_dict:
match2_dict[match2_pid] = []
match2_dict[match2_pid].append(researcher_id)
if pd.notna(match3_pid):
if match3_pid not in match3_dict:
match3_dict[match3_pid] = []
match3_dict[match3_pid].append(researcher_id)
# Flatten the dictionaries back into the DataFrame
for match2_pid, researcher_ids in match2_dict.items():
for researcher_id in researcher_ids:
match2_row = df_match2_match3[df_match2_match3['PROLIFIC_ID'] == match2_pid]
if not match2_row.empty:
index = match2_row.index[0]
if pd.isna(df_match2_match3.loc[index, 'Match5_PID']):
df_match2_match3.at[index, 'Match5_PID'] = researcher_id
df_match2_match3.at[index, 'Match5_Position'] = df_match2_match3.loc[df_match2_match3['PROLIFIC_ID'] == researcher_id, 'Position'].values[0]
df_match2_match3.at[index, 'Match5_PrimaryArea'] = df_match2_match3.loc[df_match2_match3['PROLIFIC_ID'] == researcher_id, 'PrimaryArea'].values[0]
df_match2_match3.at[index, 'Match5_FeedbackText'] = df_match2_match3.loc[df_match2_match3['PROLIFIC_ID'] == researcher_id, 'FeedbackText'].values[0]
elif pd.isna(df_match2_match3.loc[index, 'Match6_PID']):
df_match2_match3.at[index, 'Match6_PID'] = researcher_id
df_match2_match3.at[index, 'Match6_Position'] = df_match2_match3.loc[df_match2_match3['PROLIFIC_ID'] == researcher_id, 'Position'].values[0]
df_match2_match3.at[index, 'Match6_PrimaryArea'] = df_match2_match3.loc[df_match2_match3['PROLIFIC_ID'] == researcher_id, 'PrimaryArea'].values[0]
df_match2_match3.at[index, 'Match6_FeedbackText'] = df_match2_match3.loc[df_match2_match3['PROLIFIC_ID'] == researcher_id, 'FeedbackText'].values[0]
for match3_pid, researcher_ids in match3_dict.items():
for researcher_id in researcher_ids:
match3_row = df_match2_match3[df_match2_match3['PROLIFIC_ID'] == match3_pid]
if not match3_row.empty:
index = match3_row.index[0]
if pd.isna(df_match2_match3.loc[index, 'Match6_PID']):
df_match2_match3.at[index, 'Match6_PID'] = researcher_id
df_match2_match3.at[index, 'Match6_Position'] = df_match2_match3.loc[df_match2_match3['PROLIFIC_ID'] == researcher_id, 'Position'].values[0]
df_match2_match3.at[index, 'Match6_PrimaryArea'] = df_match2_match3.loc[df_match2_match3['PROLIFIC_ID'] == researcher_id, 'PrimaryArea'].values[0]
df_match2_match3.at[index, 'Match6_FeedbackText'] = df_match2_match3.loc[df_match2_match3['PROLIFIC_ID'] == researcher_id, 'FeedbackText'].values[0]
# Save the updated DataFrame to a new Excel file
updated_file_path_match2_match3_corrected = '/mnt/data/Contact list - automated corrected.xlsx'
df_match2_match3.to_excel(updated_file_path_match2_match3_corrected, index=False)
# Verification
researchers_with_match2_corrected = df_match2_match3['Match2_PID'].notna().sum()
researchers_with_match5_corrected = df_match2_match3['Match5_PID'].notna().sum()
researchers_with_match3_corrected = df_match2_match3['Match3_PID'].notna().sum()
researchers_with_match6_corrected = df_match2_match3['Match6_PID'].notna().sum()
researchers_with_match2_corrected, researchers_with_match5_corrected, researchers_with_match3_corrected, researchers_with_match6_corrected, updated_file_path_match2_match3_corrected
I’ve mainly tried variations of the code that I posted.
Simen Bo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.