I’m working on a project where audio data is streamed to a device. The audio data is encoded via opus and streamed at 20 ms payloads at a time. The streaming is done via TCP to avoid packet loss completely. The goal of the streaming is to have as close as possible to live audio streaming, without audio loss, or jittering.
Currently what happens on slower Internet connections, the audio may jitter a little bit. I am not using any buffers currently, but the goal is to be able to have as close as possible to “live streaming” but at the same time eliminating the jitter.
I’ve looked into jitter buffers, and it seems that jitter buffers also are supposed to handle delays on both end so that both ends are as in-sync as possible, which sounds like overkill for my situation. I’m afraid that if I make a static buffer size, it will take away from the live streaming aspect if this isn’t necessary.
So this leaves me with a few questions, that are all somehow related.
- What is a good method or algorithm for detecting the buffer length?
- What’s the best way to start feeding data to the decoder on the receiver end? Is it when the buffer reaches a certain amount of milliseconds full it’ll start feeding data in 20 ms payloads?
- Do I delay playing if the buffer gets underfilled?
- Will the buffer be in bytes or time length?
Thanks so much!
2
It entirely depends on the throughput of your network – if you can keep a single second of data filled then that’s all you need! Obviously, you need to determine the length of time your network is going to stutter and miss filling the buffer. Test it and see.
Otherwise, it might be easier to configure the buffer size, fast networks can have a second’s buffer (nobody will notice audio 1 second behind capture) and slow high-latency or poor throughput networks can buffer up more. You might be able to resize your buffer during playback if it ever empties completely, but you’ll more likely than not be stuttering playback continually in this case.
Generally you only delay playing if the buffer empties completely. There’s no point having a buffer if you don’t use it.
If your audio is 20ms packets, then size == time.