My question should be pretty clear. I proofread legal transcripts of audio files which were taken down live. My job is then to proofread this after the fact while listening to the audio recording, and to fix any errors, add in missing words, or remove incorrect words.
The human-made transcript is roughly 90-95% accurate, let’s say. I have clean audio with one person speaking at a time. In total, three to five people will speak during the proceedings.
The transcripts are always in Q & A format, like this:
Q. So Doctor, what were the results of your x-ray?
A. The patient exhibited signs of bone fracture and osteoarthritis.
Q. And what did you conclude?
A. I had no other conclusions.
What I want to do is code some kind of program (for Windows) that will auto-transcribe the audio file, then match the questions and answers to the manmade file, and then compare the AI transcript to the manmade one, but favoring the original transcript if there is doubt. There should be a high degree of confidence before it revises the original question or answer. The software should simply output a new text file with a hybrid transcript. Then later I will proofread the whole thing, but it will save me many keystrokes.
I need help in finding the right speech-to-text software (freeware ideally) and also in the best way to match and compare the two transcripts line by line. Thanks!