I’m trying to use Vosk speech recognition in a Python script, but the result is always :
{
"text" : ""
}
It’s not a problem with my file because when I use in DOS “vosk-transcriber -l fr -i speech3.wav -o test6.txt” it works perfectly and I got a test6.txt with an accurate transcription.
Here is my Python :
import vosk
# Load the Vosk model
model = vosk.Model("voskSmallFr")
# Initialize the recognizer with the model
recognizer = vosk.KaldiRecognizer(model, 16000)
# Sample audio file for recognition
audio_file = "speech3.wav"
# Open the audio file
with open(audio_file, "rb") as audio:
while True:
# Read a chunk of the audio file
data = audio.read(4000)
if len(data) == 0:
break
# Recognize the speech in the chunk
recognizer.AcceptWaveform(data)
# Get the final recognized result
result = recognizer.FinalResult()
print(result)
I downloaded and tried every models available in French (my wav file is in French) on the official Vosk website (4 in total), the scripts run well but give no results contrary to the Windows command…
Any ideas?
Thank you
When silence is detected AcceptWaveform()
returns True and you can retrieve the result with Result()
. If it returns False you can retrieve a partial result with PartialResult()
. The FinalResult()
means the stream is ended, buffers are flushed and you retrieve the remaining result which could be silence.
What you could do is
import json
text = []
with open(audio_file, "rb") as audio:
while True:
data = audio.read(4000)
if len(data) == 0:
break
# if silence detected save result
if recognizer.AcceptWaveform(data):
text.append(json.loads(recognizer.Result())["text"])
text.append(json.loads(recognizer.FinalResult())["text"])
and you get a list of sentences.
4