I m trying to transcribe audio to text using Google Speech to Text using Python. For this I m using the websocket stream to receive the audio.
@app.route("/receive_call", methods=['GET', 'POST'])
def receive_call():
if request.method == 'POST':
xml = f"""
<Response>
<Say>
Speak to see your speech transcribed in the console
</Say>
<Connect>
<Stream url='wss://{request.host}{WEBSOCKET_ROUTE}' />
</Connect>
</Response>
""".strip()
return Response(xml, mimetype='text/xml')
else:
return f"Real-time phone call transcription app"
Here the websocket Code:
@sock.route(WEBSOCKET_ROUTE)
def transcription_websocket(ws):
# audio_generator(ws)
while True:
data = json.loads(ws.receive())
match data['event']:
case "connected":
transcriber = TwilioTranscriber()
transcriber.connect()
print('transcriber connected')
case "start":
print('twilio started')
case "media":
payload_b64 = data['media']['payload']
chunk = base64.b64decode(payload_b64)
transcribe=transcribe_audio(chunk)
print(transcribe)
The above value is returned as None.
So far I have tried to transcribe the audio using Google Speech to Text Api. Tried with the MULAW or Linear Encoding but getting the None as output.
Tried with
**Encoding ** : MULAW or Linear16
sample rate: 8000 or 16000
None of the above is working.
def transcribe_audio(audio_content):
client = speech.SpeechClient()
audio = speech.RecognitionAudio(content=audio_content)
config = speech.RecognitionConfig(
encoding="MULAW",
sample_rate_hertz=8000,
language_code="en-US" )
response = client.recognize(config=config, audio=audio)
for result in response.results:
return result.alternatives[0].transcript
Note: Using Ngrok to run the websocket locally
user13442133 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.