import pandas as pd
df = pd.DataFrame(columns=['A', 'B', 'C', 'id'])
df['id'] = [i for i in range(len(df))]
df['A'] = ['a1', 'a2', 'a3', 'a3', 'a2', 'a1', 'a2']
df['B'] = ['b2', 'b1', 'b1', 'b2', 'b3', 'b2', 'b1']
df['C'] = ['c1', 'c3', 'c4' ,'c2', 'c1', 'c3', 'c2']
print(df)
list_dict_cols = [{'A': ['a1', 'a2'], 'B': ['b2']}, {'A': ['a1'], 'B': ['b3'], 'C': ['c2']}, {'C': ['c1']}]
df_res = pd.DataFrame()
for dict_cols in list_dict_cols:
df_tem = df.copy()
for key, val in dict_cols.items():
df_tem = df_tem[df_tem[key].isin(val)]
df_res = pd.concat([df_res, df_tem])
df_res.drop_duplicates(inplace=True)
print(df_res)
I have some trades with three categorical columns:
'A': ['a1', 'a2', 'a3'], 'B': ['b1', 'b2', 'b3'], 'C': ['c1', 'c2', 'c3' ,'c4']
Now I have some filters:
list_dict_cols = [{'A': ['a1', 'a2'], 'B': ['b2']}, {'A': ['a1'], 'B': ['b3'], 'C': ['c2']}, {'C': ['c1']}]
I want to obtain the union of trades under those filters. Now I use the most naive way above by enumerating the filters and merging them and removing the duplicates.
Is there any generic way that first merge on the filers. For example,
-
if
{'B': ['b2']}
is in the filters, then we can discard any subset like{'B': ['b2'], C: ['c2']}
, etc, -
if
{'A': ['a1'], 'B': ['b1']}, {'A': ['a1'], 'B': ['b2']}, {'A': ['a1'], 'B': ['b3']}
are in, we can merge them as{'A': ['a1']}.
The goal for this is to reduce the filters since the real df
may be very large.