I have a Pandas dataframe df that looks like
Class_ID Student_ID theta
6 1 0.2
6 2 0.2
6 4 0.1
6 3 0.5
3 2 0.1
3 5 0.2
3 7 0.22
3 4 0.4
3 9 0.08
And I want to create a new dataframe by considering every Student_ID
combination within the same Class_ID
:
Class_ID Student_ID_x theta_x Student_ID_y theta_y Student_Combination
6 1 0.20 2 0.20 1_2
6 1 0.20 4 0.10 1_4
6 1 0.20 3 0.50 1_3
6 2 0.20 4 0.10 2_4
6 2 0.20 3 0.50 2_3
6 3 0.50 4 0.10 3_4
3 2 0.10 5 0.20 2_5
3 2 0.10 7 0.22 2_7
3 2 0.10 4 0.40 2_4
3 2 0.10 9 0.08 2_9
3 5 0.20 7 0.22 5_7
3 5 0.20 9 0.08 5_9
3 7 0.22 9 0.08 7_9
3 4 0.40 5 0.20 4_5
3 4 0.40 7 0.22 4_7
3 4 0.40 9 0.08 4_9
My code for this is
df_new = df.merge(
df,
how='inner',
on=['Class_ID']
)
df_new['Student_ID_x'] = df_new['Student_ID_x'].astype(int)
df_new['Student_ID_y'] = df_new['Student_ID_y'].astype(int)
df_new = df_new[df_new['Student_ID_x'] < df_new['Student_ID_y']]
df_new['Student_Combination'] = [f'{x}_{y}' for x, y in zip(df_new['Student_ID_x'], df_new['Student_ID_y'])]
and this works fine. However, for the new df df_new
, I want to create a new column called feature
for every Student_Combination
by applying a function defined as follow:
def func(theta_x, theta_y, *theta):
numerator = theta_x + theta_y
denominator = 0
for t in theta:
denominator += t*t
return numerator / denominator
where the *theta
‘s are the theta
‘s from the same Class_ID
other than the original 2 students. So for example, for Student_Combination = 1_2
in Class 6, the feature equals (0.2+0.2)/(0.1^2+0.5^2) = 1.53846154
and the desired output looks like
Class_ID Student_ID_x theta_x Student_ID_y theta_y Student_Combination feature
6 1 0.20 2 0.20 1_2 1.53846154
6 1 0.20 4 0.10 1_4 1.03448276
6 1 0.20 3 0.50 1_3 14
6 2 0.20 4 0.10 2_4 1.03448276
6 2 0.20 3 0.50 2_3 14
6 3 0.50 4 0.10 3_4 7.5
3 2 0.10 5 0.20 2_5 1.39664804
3 2 0.10 7 0.22 2_7 1.5503876
3 2 0.10 4 0.40 2_4 5.2742616
3 2 0.10 9 0.08 2_9 1.28205128
3 5 0.20 7 0.22 5_7 2.38095238
3 5 0.20 9 0.08 5_9 1.28205128
3 7 0.22 9 0.08 7_9 1.42857143
3 4 0.40 5 0.20 4_5 9.25925926
3 4 0.40 7 0.22 4_7 10.9929078
3 4 0.40 9 0.08 4_9 4.87804878
And I have no idea how to do this. Thanks in advance