I am currently trying to write a python script that recognizes speech from a microphone using continuous recognition. I used the sample code from the Azure speech service (https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-recognize-speech?pivots=programming-language-python). However, my program never exits the while loop. How can I stop the recognition without typing in a command? Is it possible to stop the continuous speech recognition with a verbal command (e.g. a long pause or saying a keyword) ? I am trying to build a voicebot. Am I correct that in order to interact with the voicebot, users should either speak < 15 s (using single-shot recognition) or interact with the device after every utterance (using continuous recognition) ?
Thank you!
duplicate question without reply:
https://learn.microsoft.com/en-us/answers/questions/1850234/stop-continuous-speech-recognition-from-microphone?comment=question#newest-question-comment
code:
import time
from dotenv import dotenv_values
import azure.cognitiveservices.speech as speechsdk
def recognised_speech(evt):
print(f"You: {evt.result.text}")
def cont_speech_to_text():
done_talking=False
def stop_cb(evt):
print('You: {}'.format(evt))
nonlocal done_talking
done_talking = True
speech_recognizer.stop_continuous_recognition()
#speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(recognised_speech)
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)
speech_recognizer.start_continuous_recognition()
while not done_talking:
time.sleep(.5)
SPEECH_REGION = "westeurope"
keypath="..."
speechkey=dotenv_values(keypath+".key")
speech_config = speechsdk.SpeechConfig(subscription=speechkey['KEY'], region=SPEECH_REGION)
speech_config.speech_recognition_language="en-US"
speech_config.speech_synthesis_voice_name='en-US-AvaMultilingualNeural'
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
cont_speech_to_text()
”’