I am new to the the audio world in both the frontend and backend. My goal is to use the Open AI API to convert text in the client into audio that is then played when the user clicks the text and i am using NestJS for my backend with an Angular frontend. So far i think i have been able to do the following:
- Generate text on the client that i want audio to be created from
- send that text to the server so it can generate audio from that
- have the server stream the audio in the response so the client should be able to start playing the audio before the full audio has been created
- got the ArrayBuffer/Blob into the client and used an
Audio
object to set the source to that Blob
My issue with this is that i am unable to stream the audio in this way when using Angular’s HttpClient to make my post request to return the audio in a stream. I can only get access to the finished response, which is undesirable. I have seen other devs use presigned urls so that they can set the src of the audio element directly to that url but i am unfamiliar with generating presigned urls. I am handling this scenario in what seems like an interesting way and the streaming capability seems to be working but im unsure if its even a “good” way to do it.
- my client makes a POST request with a body of { text: string }
- my server generates a uuid and puts a new entry in a local map to store the text using that uuid and returns { url:
my-server-url.com/stream/${uuid}
} - my client receives that url from the backend and then immediately creates a
new Audio(url)
and sets it to autoplay - server takes the uuid from the url params and looks up the text in the map using the uuid. if found it deletes the entry from the map. it then returns the streamed response from open AI apis speech api to the client.
- since the response goes directly to the audio element it all seems to work
this seems to be fine but the only problem i can see is that if i ever lock down the /strea/${uuid} endpoint then there does not seem to be a way to get the audio element to attach an auth header.
Is this similar to a presigned url approach? Would it really be that bad if this endpoint did not require an authorization header in order to be used (as nothing would happen if the uuid sent was not located in the map at that time)?
Like I said I am new at this so any advice would be great. im also using media/opus since it said thats decent for streaming in this way but would be open to looking into the other media types if this is not the best