I have downloaded one day’s worth of Twitter (X) data from the Internet Archive’s Twitter Stream Grab here: https://archive.org/details/twitterstream. The data is provided as JSON files.
But when I try parsing the JSON using an online JSON viewer it says ‘invalid JSON’. When I try reading the file using the Python “json” package I get an error:
import json
with open(file, 'r') as f:
data = json.load(f)
JSONDecodeError: Extra data: line 2 column 1 (char 4599)
Has anyone successfully utilized this resource? If the JSON files are formatted correctly then why am I unable to parse them?