Thiết kế website giá rẻ

Question

I’m a newbie to Python and I’m facing a bit of a roadblock in my current project. I’ve been at it for a while now and tried searching for solutions, but I’m still stuck. Any help would be greatly appreciated!

Project Setup:

Backend: FastAPI app

Frontend: React app

Text-to-Speech: Azure Cognitive Services Speech SDK

Goal:

I’m building a route in my FastAPI app that streams audio data from Azure Text-to-Speech to the React frontend.

Code Snippet:

Define the text-to-speech stream function

async def text_to_speech_stream(text):
try:
result = speech_synthesizer.speak_text_async(text).get()
if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
audio_data_stream = speechsdk.AudioDataStream(result)
# read data from audio data stream and process it in memory
# Reset the stream position to the beginning since saving to file puts the position to end.
audio_data_stream.position = 0

<code> # Prepare an in-memory stream for the WAV file

wav_header = io.BytesIO()

wf = wave.open(wav_header, 'wb')

wf.setnchannels(1) # Mono channel

wf.setsampwidth(2) # Sample width in bytes

wf.setframerate(16000) # Sample rate in Hz

# First, write headers and any required initial data

wf.writeframes(b'') # Write empty data to put the header

# Continue to stream the actual audio data

audio_buffer = bytes(16000)

while True:

filled_size = audio_data_stream.read_data(audio_buffer)

if filled_size > 0:

print(f"{filled_size} bytes received.")

# Write to the in-memory WAV file

wf.writeframes(audio_buffer[:filled_size])

# Go to the start to read what was just written and stream it

wav_header.seek(0)

data_to_send = wav_header.read() # Read all data to send

yield data_to_send # Yield the read data

else:

break

wf.close()

elif result.reason == speechsdk.ResultReason.Canceled:

cancellation_details = result.cancellation_details

print("Speech synthesis canceled: {}".format(cancellation_details.reason))

if cancellation_details.reason == speechsdk.CancellationReason.Error:

print("Error details: {}".format(cancellation_details.error_details))

except Exception as ex:

print(f"Error synthesizing audio: {ex}")

yield b''

</code>

<code> # Prepare an in-memory stream for the WAV file wav_header = io.BytesIO() wf = wave.open(wav_header, 'wb') wf.setnchannels(1) # Mono channel wf.setsampwidth(2) # Sample width in bytes wf.setframerate(16000) # Sample rate in Hz # First, write headers and any required initial data wf.writeframes(b'') # Write empty data to put the header # Continue to stream the actual audio data audio_buffer = bytes(16000) while True: filled_size = audio_data_stream.read_data(audio_buffer) if filled_size > 0: print(f"{filled_size} bytes received.") # Write to the in-memory WAV file wf.writeframes(audio_buffer[:filled_size]) # Go to the start to read what was just written and stream it wav_header.seek(0) data_to_send = wav_header.read() # Read all data to send yield data_to_send # Yield the read data else: break wf.close() elif result.reason == speechsdk.ResultReason.Canceled: cancellation_details = result.cancellation_details print("Speech synthesis canceled: {}".format(cancellation_details.reason)) if cancellation_details.reason == speechsdk.CancellationReason.Error: print("Error details: {}".format(cancellation_details.error_details)) except Exception as ex: print(f"Error synthesizing audio: {ex}") yield b'' </code>

        # Prepare an in-memory stream for the WAV file
        wav_header = io.BytesIO()
        wf = wave.open(wav_header, 'wb')
        wf.setnchannels(1)  # Mono channel
        wf.setsampwidth(2)  # Sample width in bytes
        wf.setframerate(16000)  # Sample rate in Hz

        # First, write headers and any required initial data
        wf.writeframes(b'')  # Write empty data to put the header
        
        # Continue to stream the actual audio data
        audio_buffer = bytes(16000)
        while True:
            filled_size = audio_data_stream.read_data(audio_buffer)
            if filled_size > 0:
                print(f"{filled_size} bytes received.")
                # Write to the in-memory WAV file
                wf.writeframes(audio_buffer[:filled_size])
                # Go to the start to read what was just written and stream it
                wav_header.seek(0)
                data_to_send = wav_header.read()  # Read all data to send
                yield data_to_send  # Yield the read data
            else:
                break
        
        wf.close()
            
    elif result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        print("Speech synthesis canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))
except Exception as ex:
    print(f"Error synthesizing audio: {ex}")
    yield b''

Define text to speech chat route

@app.post(“/chat-text-to-speech/”)
async def chat_text_to_speech(data: Chat_Request):
# Send the text to the LLM and get the response
#response = response_from_LLM(data.message)
response={“response”: “this is an example of text to speech, I need you to speek to me. this is really important.”}
# Check if the response is successful
if response:
# Convert the response to speech
audio_stream = text_to_speech_stream(response[“response”])

<code> if audio_stream:

return StreamingResponse(audio_stream, media_type='audio/wav')

else:

return {"error": "Unable to synthesize audio"}

else:

return {"error": "Unable to get response from LLM"}

</code>

<code> if audio_stream: return StreamingResponse(audio_stream, media_type='audio/wav') else: return {"error": "Unable to synthesize audio"} else: return {"error": "Unable to get response from LLM"} </code>

    if  audio_stream:
        return StreamingResponse(audio_stream, media_type='audio/wav')
    else:
        return {"error": "Unable to synthesize audio"}
else:
    return {"error": "Unable to get response from LLM"}

Problem:

When I use Postman to send a request to the /chat-text-to-speech/ route, I only receive the initial chunk of the audio data, which is the size of audio_buffer (16000 bytes). If I increase the buffer size to 160000 bytes, I get a longer audio clip, and the audio clip contain correct content. and I got those info in my terminal when I call the text_to_speech_stream function:

“””

16000 bytes received.

15200 bytes received.

“””

Question:

I’m not sure how to modify my code to receive the entire streaming audio data from Azure TTS. Is the issue with how I’m reading the audio_data_stream or how I’m handling the in-memory WAV file creation?

Additional Notes:

I’ve gone through the Azure Speech SDK documentation and checked their github example(azure speech example) but haven’t found a solution specific to this scenario.

Any pointers or suggestions would be a huge help!

Thanks in advance!

Thiết kế website giá rẻ

Danh mục

Stuck with Streaming Text-to-Speech Audio in FastAPI (React Frontend, Azure TTS)

Define the text-to-speech stream function

Define text to speech chat route