I have a dataframe as the following, showing the relationship of different entities in each row.
Child | Parent | Ult_Parent | Full_Family |
---|---|---|---|
A032 | A001 | A039 | A001, A032, A039, A040, A041, A043, A043, A045, A046 |
In the “Full_Family” column, it’s showing the correct hierarchy for the whole family tree from small to big, not just entities in that specific row. But to be aware of, not every entity in the “Full_Family” will appear in the “Child/Parent/Ult_Parent”, as the “Full_Family” data is coming from the other source.
Here I have 2 problems need to solve:
- The Child/Parent/Ult_Parent order is not correct, how to based on the “Full_Family” column to identify the correct order?
- If possible, Can I also identify the correct Ult_Parent for a family in the whole table instead of each row. This correct Ult_Parent needs to be the entity that has appeared in Child/Parent/Ult_Parent columns.
Below is the example and ideal outcome:
Child | Parent | Ult_Parent | Full_Family | Correct_Order | Correct_Ult_Parent_per_Family |
---|---|---|---|---|---|
A032 | A001 | A039 | A001, A032, A039, A040, A041, A043, A043, A045, A046 | A001, A032, A039 | A043 |
A001 | A043 | A039 | A001, A032, A039, A040, A041, A043, A043, A045, A046 | A001, A039, A043 | A043 |
Although A046 is the ultimate parent in this family, but it does not appear in this dataframe’s Child/Parent/Ult_Parent, so the correct ultimate parent per family is A043 in this case.
Appreciate the help.