Suppose you have a JSON file like this:
{
"a": 0
}
{
"a": 1
}
It’s not JSONL, because each object takes more than one line. But it’s not a single valid JSON object either. It’s sequentially listed pretty-printed JSON objects.
json.loads
in Python gives an error about invalid formatting if you attempt to load this, and the documentation indicates it only loads a single object. But tools like jq
can read this kind of data without issue.
Is there some reasonable way to work with data formatted like this using the core json library? I have an issue where I have some complex objects and while just formatting the data as JSONL works, for readability it would be better to store the data like this. I can wrap everything in a list to make it a single JSON object, but that has downsides like requiring reading the whole file in at once.
There’s a similar question here, but despite the title the data there isn’t JSON at all.
1
You can partially decode text as JSON with json.JSONDecoder.raw_decode
. This method returns a 2-tuple of the parsed object and the ending index of the object in the string, which you can then use as the starting index to partially decode the text for the next JSON object:
import json
def iter_jsons(jsons, decoder=json.JSONDecoder()):
index = 0
while (index := jsons.find('{', index)) != -1:
data, index = decoder.raw_decode(jsons, index)
yield data
so that:
jsons = '''
{
"a": 0
}
{
"a": 1
}'''
for j in iter_jsons(jsons):
print(j)
outputs:
{'a': 0}
{'a': 1}
Demo here
Note that the starting index as the second argument to json.JSONDecoder.raw_decode
is an implementation detail, and that if you want to stick to the publicly documented API you would have to use the less efficient approach of slicing the string (which involves copying the string) from the index before you pass it to raw_decode
:
def iter_jsons(jsons, decoder=json.JSONDecoder()):
index = 0
while (index := jsons.find('{', index)) != -1:
data, index = decoder.raw_decode(jsons := jsons[index:])
yield data
Here is a way: Attempt to json.loads()
, then
- If succeeded, we are at the end of the string
- If not, load the object up to the error spot,
error.pos
Code:
import json
text = """
{
"a": 0
}
{
"a": 1
}
"""
obj_list = []
while True:
try:
obj_list.append(json.loads(text))
# Success means we have reached the end of the string
break
except json.decoder.JSONDecodeError as error:
# error.pos is where the error happens within the text
valid_text, text = text[:error.pos], text[error.pos:]
obj_list.append(json.loads(valid_text))
print(obj_list)
The best option would be to use build-in python library pprint
Here
stuff is a dictionary object.
If json is listed in a file you can load it using
stuff = json.load(file_path)
otherwise if it is a file then you can use
stuff = json.load(file_path). As for the printing is concerned ppprint will do the job for you.
pprint.pp(stuff)