I have a problem. I have a nested JSON
file:
json_data = '''
{
"appVersion": "",
"device": {
"model": ""
},
"bef": {
"catalog": ""
},
"data": [
{
"timestamp": "",
"label": "",
"category": ""
}
]
}
I would like to extract all data and if it is nested I would like it to be separated with a _
.
I have tried to normalise the nested JSON file. I use json_normalise
for this.
Unfortunately, the desired output is not what I want and need.
Furthermore, I want that there can be any possible number of nested values, so I tried to solve it with a loop.
How can I produce the desired output?
import pandas as pd
import json
json_data = '''
{
"appVersion": "0.0.3",
"device": {
"model": "Lenovo"
},
"bef": {
"catalog": "Manual"
},
"data": [
{
"timestamp": "2024-04-24 12:08:02.415077",
"label": "zuf",
"category": "50"
}
]
}
'''
parsed_json = json.loads(json_data)
def extract_metadata(json_data):
metadata = {}
for key, value in json_data.items():
if isinstance(value, dict):
for k, v in value.items():
metadata[f'{key}_{k}'] = v
else:
metadata[key] = value
return metadata
meta_data = extract_metadata(parsed_json)
df_main = pd.json_normalize(parsed_json['data'], sep='_')
df_meta = pd.DataFrame([meta_data])
df = pd.concat([df_main, df_meta], axis=1)
print(df)
What I got
timestamp label category appVersion device_model
0 2024-04-24 12:08:02.415077 zuf 50 0.0.3 Lenovo
bef_catalog data
0 Manual [{'timestamp': '2024-04-24 12:08:02.415077', '...
What I want
appVersion device_model bef_catalog data_timestamp data_label data_category
0.0.3 Lenovo Manual 2024-04-24 12:08:02.415 zuf 50