I have two Pandas dataframes with the identical structure, but varying customer_ids:
customer_id | gender | age | region |
---|---|---|---|
values | values | values | values |
I have executed a GroupBy on dataframe_1 to count a number of customers in each group and got a distribution in a form of dataframe:
pd.DataFrame(dataframe_1.groupby([‘gender’, ‘age’, ‘region’])[‘customer_id’].count()).reset_index
It looks like this:
gender | age | region | customer_id |
---|---|---|---|
M | 18-20 | America | 10 |
… | … | … | … |
F | 60+ | Europe | 20 |
Is there a way to use this distribution on dataframe_2 to get the matching rows (in a separate datframe): 10 males of age 18-20 from region ‘America’, 20 females of age 60+ from region ‘Europe’, etc.