I’m using azure the container speechToText service on prem on an openshift cluster. I use last version: 4.7.0-amd64-fr-fr. On the other hand i have small contenerized flask API that use azure cognitive service sdk to interract with the container, its the only way for me to be able to use websocket protocol in my organization.
For the methods i implemented that use a speechsdk.SpeechRecognizer
, everything works fine but with a speechsdk.transcription.ConversationTranscriber
, the transcription is systematically cancelled with this error:
CLOSING on ConversationTranscriptionCanceledEventArgs(session_id=edfb67ce81194adebba896e90996e20d, result=ConversationTranscriptionResult(result_id=614535e47f7b48ba9f7c2b22c1071c07, speaker_id=, text=, reason=ResultReason.Canceled)) CancellationDetails(reason=CancellationReason.Error, error_details="WebSocket upgrade failed: Internal service error (404). Error Details: Failed with HTTP 404 Not Found ws://<service>.<namespace>.svc.cluster.local/speech/universal/v2?language=fr-FR X-ConnectionId: edfb67ce81194adebba896e90996e20d Please check request details. SessionId: edfb67ce81194adebba896e90996e20d")
i’m on redhat rhel8
Here is my code:
`host=ws://<service>.<namespace>.svc.cluster.local:80
app.route('/conversation', methods=['POST'])
@handle_file_d
def conversation_transcription(filename):
done = False
speech_config = speechsdk.SpeechConfig(host=host)
audio_config = speechsdk.audio.AudioConfig(filename=filename)
speech_config.speech_recognition_language=os.environ.get('SPEECH_LANGUAGE')
conversation_transcriber = speechsdk.transcription.ConversationTranscriber(speech_config=speech_config,
audio_config=audio_config)
def conversation_transcriber_recognition_canceled_cb(evt: speechsdk.SessionEventArgs):
print('Canceled event')
def conversation_transcriber_session_started_cb(evt: speechsdk.SessionEventArgs):
print('SessionStarted event')
def conversation_transcriber_session_stopped_cb(evt: speechsdk.SessionEventArgs):
print('SessionStopped event')
def conversation_transcriber_transcribed_cb(evt: speechsdk.SpeechRecognitionEventArgs):
if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
text = 'tText={}n'.format(evt.result.text)
speaker_id = 'tSpeaker ID={}n'.format(evt.result.speaker_id)
print(f"{speaker_id}, {text}")
elif evt.result.reason == speechsdk.ResultReason.NoMatch:
# output.put('tNOMATCH: Speech could not be TRANSCRIBED: {}n'.format(evt.result.no_match_details))
print('tNOMATCH: Speech could not be TRANSCRIBED: {}n'.format(evt.result.no_match_details))
pass
else:
pass
def start_transcribing_async_cb(evt):
print('Start transcribing event')
def stop_cb(evt: speechsdk.SessionEventArgs):
#"""callback that signals to stop continuous recognition upon receiving an event `evt`"""
print('CLOSING on {}'.format(evt))
if evt.result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = evt.result.cancellation_details
print(cancellation_details)
nonlocal done
done = True
conversation_transcriber.transcribed.connect(conversation_transcriber_transcribed_cb)
conversation_transcriber.session_started.connect(conversation_transcriber_session_started_cb)
conversation_transcriber.session_stopped.connect(conversation_transcriber_session_stopped_cb)
conversation_transcriber.canceled.connect(conversation_transcriber_recognition_canceled_cb)
# stop transcribing on either session stopped or canceled events
conversation_transcriber.session_stopped.connect(stop_cb)
conversation_transcriber.canceled.connect(stop_cb)
conversation_transcriber.start_transcribing_async()
while not done:
time.sleep(.5)
conversation_transcriber.stop_transcribing_async()
`
return "done"
I’ve tried using differents version of the python sdk (currently it is 1.37.0 that i use) and differents versions of the speechToText image but i still get the same error.
The transcription never trully starts, it never seems to reach the SpeechToText container.
Many thanks 🙂
Jerome Durand is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.