I’m using Solara to build a web app with Python, and I can use ipywebrtc to capture audio from the client’s browser. I can first record the audio to a temporary file and then pass it to Azure Speech, but I need it to be streaming
At first, I tried with this code
from ipywebrtc import CameraStream, AudioRecorder
from azure.cognitiveservices.speech.audio import AudioInputStream
import azure.cognitiveservices.speech as speechsdk
...
speech_config = speechsdk.SpeechConfig(subscription="...", region="brazilsouth")
camera = CameraStream(constraints={'audio': True, 'video': False})
recorder = AudioRecorder(stream=camera)
stream = AudioInputStream(recorder.audio)
audio_config = speechsdk.audio.AudioConfig(stream=stream)
conversation_transcriber = speechsdk.transcription.ConversationTranscriber(speech_config=speech_config,audio_config=audio_config)
...
Exception in thread Thread-22 (recognize_from_device):
Traceback (most recent call last):
File "C:Python312Libthreading.py", line 1073, in _bootstrap_inner
self.run()
File "C:Python312Libthreading.py", line 1010, in run
self._target(*self._args, **self._kwargs)
File "L:projectstestesfunctionsouvir_microfone.py", line 62, in recognize_from_device
audio_config = speechsdk.audio.AudioConfig(stream=stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "L:projectstestesvenvLibsite-packagesazurecognitiveservicesspeechaudio.py", line 382, in __init__
_call_hr_fn(fn=_sdk_lib.audio_config_create_audio_input_from_stream, *[ctypes.byref(handle), stream._handle])
File "L:projectstestesvenvLibsite-packagesazurecognitiveservicesspeechinterop.py", line 61, in _call_hr_fn
hr = fn(*args) if len(args) > 0 else fn()
^^^^^^^^^
ctypes.ArgumentError: argument 2: TypeError: Don't know how to convert parameter 2
Exception ignored in: <function _Handle.__del__ at 0x00000140E12B5E40>
Traceback (most recent call last):
File "L:projectstestesvenvLibsite-packagesazurecognitiveservicesspeechinterop.py", line 105, in __del__
elif self.__test_fn(self.__handle):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ctypes.ArgumentError: argument 1: TypeError: Don't know how to convert parameter 1
changing AudioInputStream(recorder.audio)
to AudioInputStream(recorder.codecs)
:
RuntimeError: Exception with error code:
[CALL STACK BEGIN]
> GetModuleObject
- audio_config_get_audio_processing_options
- audio_config_create_audio_input_from_stream
- ffi_prep_go_closure
- ffi_call_go
- ffi_call
- 00007FF910133DD5 (SymFromAddr() error: Attempt to access invalid address.)
- 00007FF910132D33 (SymFromAddr() error: Attempt to access invalid address.)
- 00007FF910132928 (SymFromAddr() error: Attempt to access invalid address.)
- PyObject_Call
- PyEval_EvalFrameDefault
- PyFunction_Vectorcall
- PyObject_VectorcallMethod
- PyObject_Vectorcall
- PyObject_Vectorcall
- PyEval_EvalFrameDefault
[CALL STACK END]
Exception with an error code: 0x5 (SPXERR_INVALID_ARG)