How to process a Parent / Child hierarchy using Python Pandas?

Hi guys ‘Big problem here’,

I have a csv file as input file, which represent the structure of a “Car”.
There are 14 columns In my input file, at the end of the process I need to get back all the columns.

Basically there are 3 mains columns to deal with to create the architecture of the “Car” :
“CAR_REFERENCE”, “PARENT”, “CHILD”

The original input file doesn’t have any ID. (Lines can be duplicated)

For example in A CAR, there are lots of same bolts and Screws mounted on the same assembly.

CAR_REFERENCE PARENT CHILD
CAR_1 WHEEL_1 BOLT
CAR_1 WHEEL_1 BOLT

This example is just here to show you guys that rows of the input file are not unique.

WHEEL_1 can obviously be part of many many different CAR_REFERENCE.

My final GOAL is :

  • First, to add a new “Level” column to visually see the hierarchy of the whole structure
    (Ex : Top Item = Level 1, first child = Level 2, first child of first child = Level 3 …)

Expected for this point :
(I’ve deliberately reduced the structure so as not to include too much information, but in my input file I have absolutely all the details and levels.)

CAR_REFERENCE PARENT CHILD LEVEL
CAR_1 -1 CHASSIS 1
CAR_1 CHASSIS REAR AXLE __2
CAR_1 REAR AXLE WHEEL HUB ___3
CAR_1 WHEEL HUB WHEEL_1 ____4
CAR_1 WHEEL HUB WHEEL_1 ____4
CAR_1 WHEEL HUB WHEEL_1 ____4
CAR_1 WHEEL HUB WHEEL_1 ____4
CAR_1 CHASSIS FRONT AXLE __2
  • Second, create another column to SORT the hierarchy by keeping the order (PowerBi don’t know the exact structure of my cars so I need to create a column only for sorting purpose, to keep the structure as it is). New column SORTED_ORDER
CAR_REFERENCE PARENT CHILD LEVEL SORTED_ORDER
CAR_1 -1 CHASSIS 1 1
CAR_1 CHASSIS REAR AXLE __2 1.1
CAR_1 REAR AXLE WHEEL HUB ___3 1.1.1
CAR_1 WHEEL HUB WHEEL_1 ____4 1.1.1.1
CAR_1 WHEEL HUB WHEEL_1 ____4 1.1.1.2
CAR_1 WHEEL HUB WHEEL_1 ____4 1.1.1.3
CAR_1 WHEEL HUB WHEEL_1 ____4 1.1.1.4
CAR_1 CHASSIS FRONT AXLE __2 1.2

**One of the probleme here is if we got : **

1.1 1.2 ... 1.9 1.10

**It will be sorted like : **

1.1 1.10 1.2 1.3 ...

So first I separate each “values” between the “.” to create segments, I check the max value of each segment and the make a condition :

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>
if segment.value < 10
Only one digit = 1.2.1.5
if segment.value < 100
Max value of segment 2 = 98
Only two digits = 1.2.02.5
Else
Max value of segment 3 = 102
Only Three digits = 1.2.02.006
</code>
<code> if segment.value < 10 Only one digit = 1.2.1.5 if segment.value < 100 Max value of segment 2 = 98 Only two digits = 1.2.02.5 Else Max value of segment 3 = 102 Only Three digits = 1.2.02.006 </code>

if segment.value < 10

Only one digit = 1.2.1.5


if segment.value < 100

Max value of segment 2 = 98
Only two digits = 1.2.02.5

Else

Max value of segment 3 = 102
Only Three digits = 1.2.02.006

So if a segment[x] value Hit 1.76.1

ALL the previous elements from the 1.1.1 to 1.76.1 will be on two digits like so :
1.01.1
1.02.1
1.03.1

Same thing when a segment hit 100 or more

Doing this I can keep the original structure Order while sorting.

Here is all my code, and What I tried to reach my two main objectives :

First I tried to create Unique ID’s by counting occurences of each child of the whole input file (1.8 millions lines) :

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>import pandas as pd
df = pd.read_csv('Original.csv', dtype=str)
child_counts = {}
child_unique_ids = {}
for index, row in df.iterrows():
child = row['CHILD']
if child not in child_counts:
child_counts[child] = 1
else:
child_counts[child] += 1
child_unique = f'Child_{child}_{child_counts[child]}'
df.loc[index, 'UNIQUE_CHILD'] = child_unique
if child not in child_unique_ids:
child_unique_ids[child] = child_unique
df['PARENT'] = df['PARENT'].map(lambda x: child_unique_ids[x] if x in child_unique_ids else x)
df.to_csv('Output.csv', index=False)
</code>
<code>import pandas as pd df = pd.read_csv('Original.csv', dtype=str) child_counts = {} child_unique_ids = {} for index, row in df.iterrows(): child = row['CHILD'] if child not in child_counts: child_counts[child] = 1 else: child_counts[child] += 1 child_unique = f'Child_{child}_{child_counts[child]}' df.loc[index, 'UNIQUE_CHILD'] = child_unique if child not in child_unique_ids: child_unique_ids[child] = child_unique df['PARENT'] = df['PARENT'].map(lambda x: child_unique_ids[x] if x in child_unique_ids else x) df.to_csv('Output.csv', index=False) </code>
import pandas as pd

df = pd.read_csv('Original.csv', dtype=str)

child_counts = {}
child_unique_ids = {}

for index, row in df.iterrows():
    child = row['CHILD']

    if child not in child_counts:
        child_counts[child] = 1
    else:
        child_counts[child] += 1

    child_unique = f'Child_{child}_{child_counts[child]}'
    df.loc[index, 'UNIQUE_CHILD'] = child_unique

    if child not in child_unique_ids:
        child_unique_ids[child] = child_unique

df['PARENT'] = df['PARENT'].map(lambda x: child_unique_ids[x] if x in child_unique_ids else x)

df.to_csv('Output.csv', index=False)

OUTPUT :

CAR_REFERENCE PARENT CHILD UNIQUE_CHILD
CAR_1 -1 CHASSIS CHASSIS_1
CAR_1 CHASSIS REAR AXLE REAR AXLE_1
CAR_1 REAR AXLE WHEEL HUB WHEEL HUB_1
CAR_1 WHEEL HUB WHEEL_1 WHEEL_1_1
CAR_1 WHEEL HUB WHEEL_1 WHEEL_1_2
CAR_1 WHEEL HUB WHEEL_1 WHEEL_1_3
CAR_1 WHEEL HUB WHEEL_1 WHEEL_1_4
CAR_1 CHASSIS FRONT AXLE FRONT AXLE_1

By doing this I’m losing Parent / Unique child hierarchy, because the UNIQUE_CHILD is no longer a PARENT.

So The last part of the code is replacing ‘PARENT’ with the first ‘UNIQUE_CHILD’ occurence :

CAR_REFERENCE PARENT CHILD UNIQUE_CHILD
CAR_1 -1 CHASSIS CHASSIS_1
CAR_1 CHASSIS_1 REAR AXLE REAR AXLE_1
CAR_1 REAR AXLE_1 WHEEL HUB WHEEL HUB_1
CAR_1 WHEEL HUB_1 WHEEL_1 WHEEL_1_1
CAR_1 WHEEL HUB_1 WHEEL_1 WHEEL_1_2
CAR_1 WHEEL HUB_1 WHEEL_1 WHEEL_1_3
CAR_1 WHEEL HUB_1 WHEEL_1 WHEEL_1_4
CAR_1 CHASSIS_1 FRONT AXLE FRONT AXLE_1

So the first STEP to create Unique ID is working.

Now, let’s create the ‘LEVEL’ and ‘Sorted_order’ columns :

Here is my code :

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>import pandas as pd
import networkx as nx
def make_hierarchy(g):
G = nx.from_pandas_edgelist(g, create_using=nx.DiGraph,
source='PARENT', target='UNIQUE_CHILD')
def dfs_with_level(node, level, order, parent_order, max_depth):
hierarchy_data = [(g.name, node, level, parent_order + f'{order:0{max_depth}d}')]
children = list(G.successors(node))
if children:
for i, child in enumerate(children, start=1):
hierarchy_data.extend(dfs_with_level(child, level + 1, i, parent_order + f'{order:0{max_depth}d}' + '.', max_depth))
return hierarchy_data
hierarchy_data = []
for node in G.nodes:
if not list(G.predecessors(node)):
max_depth = len(str(len(G)))
hierarchy_data.extend(dfs_with_level(node, 1, 1, '', max_depth))
df_hierarchy = pd.DataFrame(hierarchy_data, columns=['CAR_REFERENCE', 'UNIQUE_CHILD', 'Level', 'sorted_order'])
# Add underscores to represent hierarchy levels visually (reverse order)
max_level = df_hierarchy['Level'].max()
df_hierarchy['Level'] = df_hierarchy['Level'].apply(lambda x: '_' * (x - 1) + str(x))
return df_hierarchy
# Read data from CSV
df = pd.read_csv('Csvfrompreviouspart.csv', dtype=str)
out = df.groupby('CAR_REFERENCE', group_keys=False).apply(make_hierarchy)
# Export the DataFrame to a CSV file
out.to_csv('Output_step2.csv', index=False)
</code>
<code>import pandas as pd import networkx as nx def make_hierarchy(g): G = nx.from_pandas_edgelist(g, create_using=nx.DiGraph, source='PARENT', target='UNIQUE_CHILD') def dfs_with_level(node, level, order, parent_order, max_depth): hierarchy_data = [(g.name, node, level, parent_order + f'{order:0{max_depth}d}')] children = list(G.successors(node)) if children: for i, child in enumerate(children, start=1): hierarchy_data.extend(dfs_with_level(child, level + 1, i, parent_order + f'{order:0{max_depth}d}' + '.', max_depth)) return hierarchy_data hierarchy_data = [] for node in G.nodes: if not list(G.predecessors(node)): max_depth = len(str(len(G))) hierarchy_data.extend(dfs_with_level(node, 1, 1, '', max_depth)) df_hierarchy = pd.DataFrame(hierarchy_data, columns=['CAR_REFERENCE', 'UNIQUE_CHILD', 'Level', 'sorted_order']) # Add underscores to represent hierarchy levels visually (reverse order) max_level = df_hierarchy['Level'].max() df_hierarchy['Level'] = df_hierarchy['Level'].apply(lambda x: '_' * (x - 1) + str(x)) return df_hierarchy # Read data from CSV df = pd.read_csv('Csvfrompreviouspart.csv', dtype=str) out = df.groupby('CAR_REFERENCE', group_keys=False).apply(make_hierarchy) # Export the DataFrame to a CSV file out.to_csv('Output_step2.csv', index=False) </code>
import pandas as pd
import networkx as nx

def make_hierarchy(g):
    G = nx.from_pandas_edgelist(g, create_using=nx.DiGraph,
                                source='PARENT', target='UNIQUE_CHILD')
    
    def dfs_with_level(node, level, order, parent_order, max_depth):
        hierarchy_data = [(g.name, node, level, parent_order + f'{order:0{max_depth}d}')]
        children = list(G.successors(node))
        if children:
            for i, child in enumerate(children, start=1):
                hierarchy_data.extend(dfs_with_level(child, level + 1, i, parent_order + f'{order:0{max_depth}d}' + '.', max_depth))
        return hierarchy_data
    
    hierarchy_data = []
    for node in G.nodes:
        if not list(G.predecessors(node)):
            max_depth = len(str(len(G)))
            hierarchy_data.extend(dfs_with_level(node, 1, 1, '', max_depth))
    
    df_hierarchy = pd.DataFrame(hierarchy_data, columns=['CAR_REFERENCE', 'UNIQUE_CHILD', 'Level', 'sorted_order'])

    # Add underscores to represent hierarchy levels visually (reverse order)
    max_level = df_hierarchy['Level'].max()
    df_hierarchy['Level'] = df_hierarchy['Level'].apply(lambda x: '_' * (x - 1) + str(x))
    
    return df_hierarchy

# Read data from CSV
df = pd.read_csv('Csvfrompreviouspart.csv', dtype=str)

out = df.groupby('CAR_REFERENCE', group_keys=False).apply(make_hierarchy)

# Export the DataFrame to a CSV file
out.to_csv('Output_step2.csv', index=False)

Here is the problem :

The Input file is : 1 841 724 lines
The output is : 1 981 260 lines

(Why this difference ?)

AND

The sorted_order column is “Bugged” I mostly got two “segments” but the structure can have 12 levels so 12 segments.

Here is a sample of the .csv file after this :
I only have “Two LEVELS” on this product, but I know that there are many more
(There are approx. 2500 lines of structure for each CAR_REFERENCE)

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>CAR_REFERENCE,UNIQUE_CHILD,Level,sorted_order
CAR_1,Child_G2000_4,_2,001.001
CAR_1,Child_G1000_3,_2,001.002
CAR_1,Child_G5200_3,_2,001.003
CAR_1,Child_G7200_2,_2,001.004
CAR_1,Child_G9200_2,_2,001.005
CAR_1,Child_G0200_3,_2,001.006
CAR_1,Child_G1010_2,_2,001.007
CAR_1,Child_G7201_2,_2,001.008
CAR_1,Child_G6200_3,_2,001.009
</code>
<code>CAR_REFERENCE,UNIQUE_CHILD,Level,sorted_order CAR_1,Child_G2000_4,_2,001.001 CAR_1,Child_G1000_3,_2,001.002 CAR_1,Child_G5200_3,_2,001.003 CAR_1,Child_G7200_2,_2,001.004 CAR_1,Child_G9200_2,_2,001.005 CAR_1,Child_G0200_3,_2,001.006 CAR_1,Child_G1010_2,_2,001.007 CAR_1,Child_G7201_2,_2,001.008 CAR_1,Child_G6200_3,_2,001.009 </code>
CAR_REFERENCE,UNIQUE_CHILD,Level,sorted_order
CAR_1,Child_G2000_4,_2,001.001
CAR_1,Child_G1000_3,_2,001.002
CAR_1,Child_G5200_3,_2,001.003
CAR_1,Child_G7200_2,_2,001.004
CAR_1,Child_G9200_2,_2,001.005
CAR_1,Child_G0200_3,_2,001.006
CAR_1,Child_G1010_2,_2,001.007
CAR_1,Child_G7201_2,_2,001.008
CAR_1,Child_G6200_3,_2,001.009

So for some products the ‘Level’ column seems correct, also the ‘Sorted_order’ too, but for the most part are incorrect.

If this script was correct the goal was to merge back this dataframe with the previous one to get back all the columns from the original file.

If you guys can take a Look of all of this, I tried to give as much details as I can, I’m stuck since 3 weeks on this, trying every day something new.

Thank’s a lot for your HELP.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật