From a previous question I asked, I wanted to create and sort a certain Parent/Child hierarchy recursively.
With the huge help of one member, he find me a solution to complete this work.
But, yeah there is a but, The input file is a .csv file, with 18 columns.
The previous solutions only uses two columns (Parent and Child) and the output file just get these two columns + the column we’ve created to represent the hierarchy.
Here is the link of the previous topic:
/questions/78364917/sort-hierarchic-parent-child-list-in-python
My goal was to simply add my new column (created with the previous answer) and keep all the original DATA.
The problem is : The data from the original .csv file doesn’t have any unique ID, so I can’t merge old and new Dataframe together.
I’ve also tried to add the columns manually, here is the code :
def make_hierarchy(g):
G = nx.from_pandas_edgelist(g, create_using=nx.DiGraph,
source='REF_ARTICLE_PERE', target='REF_ARTICLE_FILS', **edge_attr=True**)
def dfs_with_level(node, level, order, parent_order, max_depth):
# Here I tried to modify the hierarchy_data squeletton, but no success
hierarchy_data = [(g.name, node, level, parent_order + f'{order:0{max_depth}d}')]
children = list(G.successors(node))
if children:
for i, child in enumerate(children, start=1):
hierarchy_data.extend(dfs_with_level(child, level + 1, i, parent_order + f'{order:0{max_depth}d}' + '.', max_depth))
return hierarchy_data
hierarchy_data = []
for node in G.nodes:
if not list(G.predecessors(node)):
max_depth = len(str(len(G)))
hierarchy_data.extend(dfs_with_level(node, 1, 1, '', max_depth))
# Here I tried to add columns in the dataframe, but they told me 4 cols passed, needed 5 somthing like this
df_hierarchy = pd.DataFrame(hierarchy_data, columns=['REF_PRODUIT', 'REF_ARTICLE_FILS', 'Level', 'sorted_order'])
# Add underscores to represent hierarchy levels visually (reverse order)
max_level = df_hierarchy['Level'].max()
df_hierarchy['Level'] = df_hierarchy['Level'].apply(lambda x: '_' * (x - 1) + str(x))
return df_hierarchy
df = pd.read_csv('input.csv')
out = df.groupby('REF_PRODUIT', group_keys=False).apply(make_hierarchy)
out.to_csv('output.csv', index=False)
I set edge_attr=Yes to keep all the cols in the graph, but the problem is coming after this step.
While checking all the nodes, I would like to keep the data of the actual row, and add it to the df_hierarchy.
In the debugger I see the data in :
G.adj.values
But it’s all the data of all rows, not of only the current row.
And even if I find a way to get the data of the current row (to add it to the new DF) I’m not sure if I’m able to add it …
I’m new to all of this Data manipulation, I’m a student, every help would be appreciated.