`Im trying to read an nested api. The record nodes is nested in level 3, and I need a subvalue of to the parent.
When reading the parent I get the object, but reading a sub key it returns a Key error:
JSON object:
data={ 'results':
[
{
'id':1,
"x": [{
"state": "Florida",
"shortname": "FL",
"info": {"governor": "Rick Scott"},
"counties": [
{"name": "Dade", "population": 12345},
{"name": "Broward", "population": 40000},
{"name": "Palm Beach", "population": 60000},
],
},
{
"state": "Ohio",
"shortname": "OH",
"info": {"governor": "John Kasich"},
"counties": [
{"name": "Summit", "population": 1234},
{"name": "Cuyahoga", "population": 1337},
],
}],
"Y":
{'TEST':1}
},
{
'id':2,
"x":[{
"state": "Ohio",
"shortname": "OH",
"info": {"governor": "John Kasich"},
"counties": [
{"name": "Summit", "population": 1234},
{"name": "Cuyahoga", "population": 1337},
],
}],
"Y":
{'TEST':1}
}
]
}
Json normalize call, which gives error: KeyError: “Key ‘TEST’ not found. To replace missing values of ‘TEST’ with np.nan, pass in errors=’ignore'”
result = pd.json_normalize(
record_path=data["results"],
meta=['x', "counties"][["x", "state"], ["x", "info", "governor"], "Y", ["Y", "TEST"]],
)
With error = “ignore” the following dataframe is returned.
name population x.state x.info.governor Y Y.TEST
0 Dade 12345 Florida Rick Scott {'TEST': 1} NaN
1 Broward 40000 Florida Rick Scott {'TEST': 1} NaN
2 Palm Beach 60000 Florida Rick Scott {'TEST': 1} NaN
3 Summit 1234 Ohio John Kasich {'TEST': 1} NaN
4 Cuyahoga 1337 Ohio John Kasich {'TEST': 1} NaN
5 Summit 1234 Ohio John Kasich {'TEST': 1} NaN
6 Cuyahoga 1337 Ohio John Kasich {'TEST': 1} NaN
2
You’ll need to do 2 levels to flatten it.
import pandas as pd
data={ 'results':
[
{
'id':1,
"x": [{
"state": "Florida",
"shortname": "FL",
"info": {"governor": "Rick Scott"},
"counties": [
{"name": "Dade", "population": 12345},
{"name": "Broward", "population": 40000},
{"name": "Palm Beach", "population": 60000},
],
},
{
"state": "Ohio",
"shortname": "OH",
"info": {"governor": "John Kasich"},
"counties": [
{"name": "Summit", "population": 1234},
{"name": "Cuyahoga", "population": 1337},
],
}],
"Y":
{'TEST':1}
},
{
'id':2,
"x":[{
"state": "Ohio",
"shortname": "OH",
"info": {"governor": "John Kasich"},
"counties": [
{"name": "Summit", "population": 1234},
{"name": "Cuyahoga", "population": 1337},
],
}],
"Y":
{'TEST':1}
}
]
}
# Flatten 'x' with correct meta paths
x_flattened = pd.json_normalize(
data,
record_path=['results', 'x'],
meta=[['results', 'id'], ['results', 'Y', 'TEST']],
meta_prefix='meta_'
)
# Flatten 'counties' within each 'x' entry
counties_flattened_list = []
for idx, row in x_flattened.iterrows():
counties_flattened = pd.json_normalize(row['counties'])
counties_flattened['state'] = row['state']
counties_flattened['shortname'] = row['shortname']
counties_flattened['governor'] = row['info.governor']
counties_flattened['meta_results.id'] = row['meta_results.id']
counties_flattened['meta_results.Y.TEST'] = row['meta_results.Y.TEST']
counties_flattened_list.append(counties_flattened)
counties_flattened = pd.concat(counties_flattened_list, ignore_index=True)
Alternate:
y_result = pd.json_normalize(
data=data,
record_path=['results'],
)
counties_result = pd.json_normalize(
data=data,
record_path=['results', 'x', 'counties'],
meta=[
['results', 'id'],
['results', 'x', 'state'],
['results', 'x', 'shortname'],
['results', 'x', 'info', 'governor']
],
)
result = pd.merge(counties_result, y_result, how='outer', left_on='results.id', right_on='id')
Output:
print(counties_flattened.to_string())
name population state shortname governor meta_results.id meta_results.Y.TEST
0 Dade 12345 Florida FL Rick Scott 1 1
1 Broward 40000 Florida FL Rick Scott 1 1
2 Palm Beach 60000 Florida FL Rick Scott 1 1
3 Summit 1234 Ohio OH John Kasich 1 1
4 Cuyahoga 1337 Ohio OH John Kasich 1 1
5 Summit 1234 Ohio OH John Kasich 2 1
6 Cuyahoga 1337 Ohio OH John Kasich 2 1