I use apply to loop the rows and get the column names of feat1, feat2 or feat3 if they are equal to 1 and scored is equal to 0. The column names are then inserted into a new feature called reason.
This solution doesn’t scale to larger dataset. I’m looking for faster approach. How can I do that?
df = pd.DataFrame({'ID':[1,2,3],
'feat1_tax':[1,0,0],
'feat2_move':[1,0,0],
'feat3_coffee': [0,1,0],
'scored':[0,0,1]})
def get_not_scored_reason(row):
exclusions_list = [col for col in df.columns if col.startswith('feat')]
reasons = [col for col in exclusions_list if row[col] == 1]
return ', '.join(reasons) if reasons else None
df['reason'] = df.apply(lambda row: get_not_scored_reason(row) if row['scored'] == 0 else None, axis=1)
print(df)
ID feat1_tax feat2_move feat3_coffee scored reason
0 1 1 1 0 0 feat1_tax, feat2_move
1 2 0 0 1 0 feat3_coffee
2 3 0 0 0 1 None