I have 2 dfs, the first df has transactions that each have a category id, the categories are multi layered and the number of layers varies. The 2. df has the categories. for each category it has the category id and the parent id. I would like to prepare df 2 in such a way that i can merge it with df 1 and then have all the layers of the categories in df1.
DF1:
Transaction ID | Category_id |
---|---|
1 | Apples |
2 | Garlic |
3 | Car |
DF2
category_id | parent_id |
---|---|
Apples | Fruit |
Garlic | Veg |
Fruit | None |
Veg | None |
Car | None |
Desired Result
Transaction ID | Category_0 | Category_1 |
1 | Apples | Fruit |
…
3 | None | Car
representation of the dfs
I have tried different ways matching the dfs but each time i got to a solution where i have too loop over df1 and then loop over df2 which seems very inefficient.
3
Outer Merge will help:
import pandas as pd
file = '/content/20240905_Temp_Table_Analysis.xlsx' #Link to File
ds1 = pd.read_excel(file, sheet_name='Sheet2') # Table 1
ds2 = pd.read_excel(file, sheet_name='Sheet3') # Table 2
# Merge Tables
ds_1 = ds1.merge(ds2, how='outer', left_on='Category_id', right_on='category_id')
ds_2 = ds_1.drop('category_id', axis=1, inplace=False)
ds_2.head()
1