I’m developing an application that requires Automatic Speech Recognition (ASR). Currently, I can only process audio input from pre-recorded videos or a microphone. However, I need to capture audio directly from playing videos or speaker output and pass it to my ASR module.
I found I can Use the pydub and speech_recognition libraries to process audio from saved files, which works fine, but it doesn’t handle real-time audio streams.
It would be better if the input is from the speaker.
王天啊 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1