The question I’m asking is similar to the one I posted here a while ago: Comparing 2 Pandas dataframes row by row and performing a calculation on each row
I got a very helpful answer to that question and I’m trying to use that information to help me answer my current question.
Task: Group a dataframe by columns trial, RECORDING_SESSION_LABEL, and IP_INDEX. For each group, I need to calculate the Euclidean distance between a row and all rows above it (so from Row 2 to Row n) using the values in columns CURRENT_FIX_X and CURRENT_FIX_Y. If the distance is less than 58.93, I need to add the value of CURRENT_FIX_INDEX from the row I’m comparing to (not against) to a list, and then concatenate that list into a string and add it to a new column (refix_list) so the string is in the new column of the row I’m comparing against.
Example: I’m on Row 7, so I’m comparing the distance of Row 7 to Rows 6, 5, 4, 3, 2, and 1 of that group. If the distance between Row 7 and Rows 5, 3, and 1 are less than 58.93, I want a comma-separated string that contains the CURRENT_FIX_INDEX value of each of those 3 rows in the refix_list column at Row 7.
Problem: I have code that I’m working with, and I’m not sure if it’s working because I get a ‘ValueError: Length of values (0) does not match length of index (297)’ when I try to print the df so I know there’s an issue either creating the list or more likely, concatenating it into a string and assigning it to the specific row.
Here’s the code I’m working with:
# Define a function to calculate Euclidean distance
def euclidean_distance(x1, y1, x2, y2):
return np.sqrt((x1 - x2)**2 + (y1 - y2)**2)
# Grouping the DataFrame by RECORDING_SESSION_LABEL, trial, and IP_INDEX
grouped = df.groupby(['RECORDING_SESSION_LABEL', 'trial', 'IP_INDEX'])
# List to store CURRENT_FIX_INDEX for each row
index_list = []
refix_values = []
# Iterate over each group
for group_name, group_df in grouped:
# Sort the group_df by some unique column
group_df = group_df.sort_values(by='trial')
# Calculate Euclidean distance for each row
for i, row in group_df.iterrows():
current_x = row['CURRENT_FIX_X']
current_y = row['CURRENT_FIX_Y']
# Calculate distance with every row above it
for j, prev_row in group_df.iloc[:i].iterrows():
current_index = prev_row['CURRENT_FIX_INDEX']
prev_x = prev_row['CURRENT_FIX_X']
prev_y = prev_row['CURRENT_FIX_Y']
distance = euclidean_distance(current_x, current_y, prev_x, prev_y)
# If distance is less than or equal to 58.93, store CURRENT_FIX_INDEX
if distance <= 58.93:
index_list.append(current_index)
refix_values.append(','.join(map(str, index_list))) #Add list of matching INDEX values to list of lists
df['refix_list'] = []
# Iterate over the DataFrame to access each row and its index
for index, row in df.iterrows():
# Assign the list to the current row in the specified column
df.at[index, refix_list] = refix_values
print(df)
From my limited knowledge, I’m guessing the issue is in the last block of code, but I’m not positive. Any help is appreciated!