I have the following input as a a column of a df, each row is one string:
surname: Chardon firstname: Marie occupation: idem link: fille age: 30
surname: Lhopital firstname: Louis-Jean occupation: sp link: chef age: 67
surname: Lavocat firstname: Marie link: femme birth_date: 1875 lob: Rigny
I want to transform this column to its own df with the attributes 'surname', 'firstname', 'occupation', 'link', 'age', 'lob', 'birth_date'
.
The end result should look like this
surname | firstname | occupation | link |
---|---|---|---|
Chardon | Marie | idem | fille |
Lhopital | Louis-Jean | sp | chef |
How can I write a function that transforms the above input into the below df?
Essentially, I want to get the substring between the attribute name and the next attribute name and connect it as a value to the first attribute.
Difficulties:
- some values contain spaces in between (e.g., “Pierre Louis” as first name), so splitting for spaces and connecting every two entries doesn’t work
- the list of attributes differs from entry to entry
Various.
Tried various .split() configurations
Tried regex
Used ChatGPT, but to no avail.