I’m using Generative AI API to return text responses as JSON strings which I intend to feed data into an application in real time. The problem is that often the JSON response provided by GenAI API includes small errors- most commonly with double quotes. These syntax issues in the response JSON string trigger errors in my python code when converting them to JSON.
For instance, I have the following JSON string:
'{"test":"this is "test" of "a" test"","result":"your result is "out" in our website"}'
As you can see, the value for “test” has multiple double quotations. So if I try to convert this to json, I get an error. What I want to do is utilize regex to convert the double quotations to single quotations. So the result can look as follows:
'{"test":"this is 'test' of 'a' test'", "result": "your result is 'out' in our website"}'
The best I can do is as follows:
def repl_call(m):
preq = m.group(1)
qbody = m.group(2)
qbody = re.sub( r'"', "'", qbody )
return preq + '"' + qbody + '"'
print( re.sub( r'([:[,{]s*)"(.*?)"(?=s*[:,]}])', repl_call, text ))
The following code successfully returns the intended result. However, if I were to add a comma, such as
{"test":"this is "test" of "a", test"","result":"your result is "out" in our website"}
…the code breaks and returns the following:
'{"test":"this is 'test' of 'a", test"","result":"your result is 'out' in our website"}'
🙁
I’ve presently have tried to improve my AI prompt (prompt engineering) to avoid the double quotations and return only a valid JSON string. This works to some degree, but I still encounter enough errors in syntax that require me to retry the same prompt multiple times- which incurs unnecessary delays and costs.
My question is:
Is there such thing as a common function and REGEX pattern I can apply in python to fix my JSON string so that it properly cleanses syntax errors? Specifically relating to misplaced double quotes.
I’m open to a variety of suggestions, including possible Python packages that can deal with JSON string cleansing. Even any advice on advanced GenAI tools that do JSON enforcement. I presently use Gemeni- which I like a lot. But doesn’t allow JSON enforcement like OpenAI’s API allows more explicitly.
Thank you in advance.