I have a web application with FastAPI backend and a web frontend. I would like to be able to play the audio via WebSockets on the client. The reason for that is that the user has a bunch of other interactions via WebSockets and I would like to use only one endpoint to keep track of the state.
The problem: the audio via WebSockets is very choppy, whereas when I load the same logic via streaming/audio request it’s just fine. So I am looking to make them equivalent.
Here’s the backend:
@router.websocket('/audio_ws')
async def audio_sockets(ws: WebSocket):
await ws.accept()
file = wave.open('my_file.wav', 'rb')
CHUNK = 1024
with open('paper_to_audio/data/paper.wav', 'rb') as file_like:
while True:
next = file_like.read(CHUNK)
if next == b'':
file_like.close()
break
await ws.send_bytes(next)
@router.get("/audio")
def read_audio():
def iterfile():
CHUNK = 1024
with open('my_file.wav', 'rb') as file_like:
while True:
next = file_like.read(1024)
if next == b'':
file_like.close()
break
yield next
return StreamingResponse(iterfile(), media_type="audio/wav")
As you can see, the logic for reading the file is basically identical.
Here’s the frontend code for reading from streaming, which works like a charm:
<audio preload="none" controls id="audio">
<source src="/audio" type="audio/wav">
</audio>
Here’s the JavaScript for reading from WebSockets:
function playAudioFromBackend() {
const sample_rate = 44100; // Hz
// Websocket url
const ws_url = "ws://localhost:8000/audio_ws"
let audio_context = null;
let ws = null;
async function start() {
if (ws != null) {
return;
}
// Create an AudioContext that plays audio from the AudioWorkletNode
audio_context = new AudioContext();
await audio_context.audioWorklet.addModule('audioProcessor.js');
const audioNode = new AudioWorkletNode(audio_context, 'audio-processor');
audioNode.connect(audio_context.destination);
// Setup the websocket
ws = new WebSocket(ws_url);
ws.binaryType = 'arraybuffer';
// Process incoming messages
ws.onmessage = (event) => {
// Convert to Float32 lpcm, which is what AudioWorkletNode expects
const int16Array = new Int16Array(event.data);
let float32Array = new Float32Array(int16Array.length);
for (let i = 0; i < int16Array.length; i++) {
float32Array[i] = int16Array[i] / 32768.;
}
// Send the audio data to the AudioWorkletNode
audioNode.port.postMessage({ message: 'audioData', audioData: float32Array });
}
ws.onopen = () => {
console.log('WebSocket connection opened.');
};
ws.onclose = () => {
console.log('WebSocket connection closed.');
};
ws.onerror = error => {
console.error('WebSocket error:', error);
};
}
async function stop() {
console.log('Stopping audio');
if (audio_context) {
await audio_context.close();
audio_context = null;
ws.close();
ws = null;
}
}
start()
And the associated worklet:
class AudioProcessor extends AudioWorkletProcessor {
constructor() {
super();
this.buffer = new Float32Array();
// Receive audio data from the main thread, and add it to the buffer
this.port.onmessage = (event) => {
let newFetchedData = new Float32Array(this.buffer.length + event.data.audioData.length);
newFetchedData.set(this.buffer, 0);
newFetchedData.set(event.data.audioData, this.buffer.length);
this.buffer = newFetchedData;
};
}
// Take a chunk from the buffer and send it to the output to be played
process(inputs, outputs, parameters) {
const output = outputs[0];
const channel = output[0];
const bufferLength = this.buffer.length;
for (let i = 0; i < channel.length; i++) {
channel[i] = (i < bufferLength) ? this.buffer[i] : 0;
}
this.buffer = this.buffer.slice(channel.length);
return true;
}
}
registerProcessor('audio-processor', AudioProcessor);
What am I doing wrong? Why is the audio choppy?
It looks like your code doesn’t use the sample_rate
variable. It’s possible that your audio is in 44.1kHz but the AudioContext
is running at 48kHz. If you don’t account for that there will not be enough samples to fill the audio output buffers.
The easiest way to avoid that is by forcing the AudioContext
to run at the desired sample rate. It will then resample internally.
audio_context = new AudioContext({
sampleRate: sample_rate
});
It’s also possible that the WebSocket
connection doesn’t send the audio at the required rate.