Thiết kế website giá rẻ

Question

Im using Google speech to text or deepgram to test this , both give me the same result.
When I perform a twilio call im getting a really wierd behaviour some times.
Like the audio is being repeated multiple times for no reason , IDK if its the way im handling the websocket or seding the auido.

When I get the response from deepgram or google speech it comes like this:

It may seem that I said all of that really fast and at the same time but that is not the case, when I speak I give it about 5 second stop between each sentence and they are being returned as if I spoke all at the same time.
This happens only in 40% of the calls , the others work as intended.

non final: hello what's your name
non final: hello what's your name can
non final: hello what's your name can you
non final: hello what's your name can you hear
non final: hello what's your name can you hear me
non final: hello what's your name can you hear me hello
non final: hello what's your name can you hear me hello
non final: hello what's your name can you hear me hello
non final: hello what's your name can you hear me hello
non final: hello what's your name can you hear me hello are
non final: hello what's your name can you hear me hello are you
non final: hello what's your name can you hear me hello are you there
non final: hello what's your name can you hear me hello are you there
non final: hello what's your name can you hear me hello are you there hello
non final: hello what's your name can you hear me hello are you there hello
non final: hello what's your name can you hear me hello are you there hello bro
non final: hello what's your name can you hear me hello are you there hello bro
non final: hello what's your name can you hear me hello are you there hello bro
non final: hello what's your name can you hear me hello are you there hello bro
non final: hello what's your name can you hear me hello are you there hello bro what
non final: hello what's your name can you hear me hello are you there hello bro what
non final: hello what's your name can you hear me hello are you there hello bro what the
non final: hello what's your name can you hear me hello are you there hello bro what the
non final: hello what's your name can you hear me hello are you there hello bro what the hell
non final: hello what's your name can you hear me hello are you there hello bro what the hell
non final: hello what's your name can you hear me hello are you there hello bro what the hell is
non final: hello what's your name can you hear me hello are you there hello bro what the hell is

Above IS the code that receives de call from twilio:

#init connection with web socket
@application.post('/call')#Make Phone Call OUTGOING
async def handle_incoming_calls(request: Request, From: Annotated[str, Form()],To: Annotated[str, Form()]):
    response = VoiceResponse()
    connect = Connect()

    URL = f"wss://{PUBLIC_URL}/stream"
    connect.stream(url=URL)
    response.append(connect)
    return Response(content=str(response), media_type='text/xml')

That Function Will send the stream to stream endpoint which will hadle the websockets:

@application.websocket('/stream')
async def websocket_endpoint(websocket: WebSocket):

    await websocket.accept()

    while True:
        await wait_for_user_input(websocket)

wait for user input function:

config = RecognitionConfig(
    encoding=RecognitionConfig.AudioEncoding.MULAW,
    sample_rate_hertz=8000,
    language_code="en",
    model="telephony",
    enable_automatic_punctuation=False
)
streaming_config = StreamingRecognitionConfig(config=config, interim_results=True)


async def wait_for_user_input(ws):

    transcript = ""
    transcript_ready = False 
    def on_transcription_response(response):
        nonlocal transcript
        nonlocal transcript_ready

        if not response.results:
            return

        result = response.results[0]
        
        if not result.alternatives:
            return
        
        transcription = result.alternatives[0].transcript
        
        if result.is_final is True:
            print("nTRANSCRIPT FINAL",transcription)
        else:
            print("non final:",transcription)

    print("WS connection opened")
    bridge = SpeechClientBridge(streaming_config, on_transcription_response)
    t = threading.Thread(target=bridge.start)
    t.start()

    while True:
        message = await ws.receive_text()
        if message is None:
            bridge.add_request(None)
            bridge.terminate()
            break

        data = json.loads(message)
        if data['event'] == 'media':
            media = data['media']
            if media['track'] == 'inbound':
                media = data["media"]
                chunk = base64.b64decode(media["payload"])
                bridge.add_request(chunk)
            
        if data["event"] == "stop":
            print(f"Media WS: Received event 'stop': {message}")
            print("Stopping...")
            break

        if transcript_ready:
            print("Transcript:  ",transcript)
            transcript_ready = False

    bridge.terminate()
    print("WS connection closed")

The same happens with deepgram.

I tryied to use different STT solutions and I face the same issue with all of them ,
Then I thought it was twilio but they checked all my calls and determined there where no issues with the call itself.

Thiết kế website giá rẻ

Danh mục

Twilio call Acting Wierd Maybe latency