In my code I’m sending a request to a text-to-speech API and receiving a stream of data that represents audio chunks in mp3 format. I write the data to a mp3 file and it works perfectly:
def textToSpeech(text):
CHUNK_SIZE = 1024
url = "https://api.elevenlabs.io/v1/text-to-speech/pNInz6obpgDQGcFmaJgB"
headers = {
"Accept": "audio/mpeg",
"Content-Type": "application/json",
"xi-api-key": "mykey",
}
data = {
"text": text,
"model_id": "eleven_multilingual_v2",
"voice_settings": {"stability": 0.5, "similarity_boost": 0.75},
}
response = requests.post(url, json=data, headers=headers)
if response.status_code == 200:
with open("original/test.mp3", "wb") as f:
for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
if chunk:
f.write(chunk)
return response.status_code
I want to write the data to a wav file and not mp3 file. I know I can convert that file to wav easily but I want to save I/O operations as I’m calling this function alot.
Simply opening wav file and writing to it will result in a corrupted file.
Another way is to use the wave library to open wav file and write the stream to it:
with wave.open("original/test.wav", "wb") as f:
f.setnchannels(1)
f.setsampwidth(2)
f.setframerate(44100)
for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
if chunk:
audio_data = AudioSegment.from_mp3(io.BytesIO(chunk))
f.writeframes(audio_data.raw_data)
But the result isn’t in a good quality as I expected, although the values of number of channels, sampling width and framerate are the right values that is expected from the response.