This issue has been the bane of my afternoon and evening. I have many files of “jsonl” (simply a text file where each line is a json response) that have what I believed was, in theory at least, simple errors. Hours of regex and reformatting attempts beg to differ.
You can get one of the offending .jsonl files here. I’m pretty sure that all the errors are in in the nonsense after “content”:
What that should look like is "content": "{"broader_impact_description": ...
In case your wondering what happened, ChatGPT-4o decided that ‘returning output in json’ meant prepending its answers with three backticks and actually writing json(l), before (for some unfathomable reason) inserting an insane amount of newlines.
Any help or guidance would be much appreciated.