First, I apologize if anything is spelled incorrectly or if I don’t explain myself clearly enough as english is not my first languale.
Now to the problem. I am both starting to work with OpenAI API and WebSockets. I understand the basics and started working with it. My problem starts when I try to stream the OpenAI tool text to speech with the streaming response method and then pass the chunks to my web browser. It does work but the audio that I get is awful.
The code to stream the chunks works pretty well when I use it directly in VS Code, which means, I do not use any webservice at all. This is the original code (I got it from OpenAI forum).
import pyaudio
from openai import OpenAI
p = pyaudio.PyAudio()
stream = p.open(format=8,
channels=1,
rate=24_000,
output=True)
client = OpenAI()
with client.audio.speech.with_streaming_response.create(
model="tts-1-hd",
voice="nova",
input="some text I want to stream",
response_format="mp3"
) as response:
for chunk in response.iter_bytes(1024):
stream.write(chunk)
As you can see, the audio is streamed directly and not passed through any WebSocket. That works pretty well.
Now, when I tried to use that code with WebSockets I changed the
stream.write(chunk)
for a
socketio.emit('audio_chunk', chunk)
I recieved that chunk into a script.js with the following code.
socket.on('audio_chunk', function (chunk) {
const arrayBuffer = new Uint8Array(chunk).buffer;
audioContext.decodeAudioData(arrayBuffer, (buffer) => {
audioBufferQueue.push(buffer);
playNextInQueue();
}, (error) => {
console.error('Error decoding audio data:', error);
});
});
function playNextInQueue() {
if (sourceNode || audioBufferQueue.length === 0) return;
const buffer = audioBufferQueue.shift();
sourceNode = audioContext.createBufferSource();
sourceNode.buffer = buffer;
sourceNode.connect(audioContext.destination);
sourceNode.onended = () => {
sourceNode = null;
playNextInQueue();
};
sourceNode.start();
}
Like I said at the beginning, it does work, but the audio that comes is awful. Sometimes I don’t understand what it says and I don’t really know what to do anymore. I know I can produce the full audio with other OpenAI tools, but if the input is too big, the time it gets to finish is quite large.
I want to believe that my javascript is wrong as I’m really new at it. Does anyone know what can I do to get a good audio quality?
Alhelí Cabrera is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2