I have 2 dataframes:
df 1 =
id | name |
---|---|
1 | name1 |
2 | name2 |
3 | name3 |
4 | name4 |
df 2 =
id | total |
---|---|
1 | 10 |
1 | 24 |
1 | 33 |
2 | 14 |
2 | 21 |
3 | 30 |
4 | 1 |
4 | 29 |
4 | 31 |
I want to be able to remove items from df1 based on if the corresponding id in df2 ‘total’ is greater than a certain value.
I tried using creating a boolean mask using DataFrame.apply
which was very slow:
def fn_should_drop(row, check_df):
df_match = check_df.loc[check_df["id"] == row["id"]]
max_for_id = df_match["id"].max()
max_value = 25
return max_for_id >= max_value
mask = df1.apply(fn_should_drop, check_df=df2, axis=1)
df_result = df1[mask]
Is there a way to achieve this using vectorization (eg np.where
)?