Hi i am looking for a efficient way to normalise a data frame that contains a column with json data.
i get json response from a website that is saved as a dataframe.
The structure is shown below
enter image description here
i have 3 standard columns the 4th column holds json data that i want to normalise and create rows duplicating each row with the value from the first 3 columns and the splitting for the 4th column.
something like below
enter image description here
i want to avoid looping through each row of the dataframe as there are over 20k rows and could take a while.
can someone point me to the right direction.
i believe the first challenge is stripping or deleting the unwanted data from the 4th column
example {‘responseDetails’: {‘pagesize’: 250, ‘pageoffset’: 0, ‘size’: 8, ‘total’: 8},
so i can normalise the json column to a dataframe.
i have tried casting the column to json and i can get the data but i would have to iterate over 20k records using a for loop.
split_col = df['document_anatomy__cr'].to_json(orient="records")
y = json.loads(split_col)
print(y[0]['data'])
anwar bham is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.