I have a dataframe like so :
data = {
'Parent': [None, None, 'A', 'B', 'C', 'I', 'D', 'F', 'G', 'H', 'Z', 'Y', None,None,None,None, 'AA', 'BB', 'CC', 'EE', 'FF', None, None],
'Child': ['A', 'B', 'D', 'D', 'D', 'C', 'E', 'E', 'F', 'F', 'G', 'H', 'Z', 'Y', 'AA', 'BB', 'CC', 'CC', 'DD', 'DD', 'DD', 'EE', 'FF']
}
df = pd.DataFrame(data)
Parent Child
0 None A
1 None B
2 A D
3 B D
4 C D
5 I C
6 D E
7 F E
8 G F
9 H F
10 Z G
11 Y H
12 None Z
13 None Y
14 None AA
15 None BB
16 AA CC
17 BB CC
18 CC DD
19 EE DD
20 FF DD
21 None EE
22 None FF
I want an output dataframe like so:
I tried using the networkx
package as suggested in this post,
This is the code I used
df['parent']=df['parent'].fillna('No Parent')
leaves =set(df['parent']).difference(df['child'])
g= nx.from_pandas_edgelist(df, 'parent', 'child', create_using=nx.DiGraph())
ancestors = {
n: nx.algorithms.dag.ancestors(g, n) for n in leaves
}
df1=(pd.DataFrame.from_dict(ancestors, orient='index')
.rename(lambda x: 'parent_{}'.format(x+1), axis=1)
.rename_axis('child')
.fillna('')
)
But I get an empty dataframe.
Is there an elegant way to achieve this?