I record audio as a blob and use it with multiple speech to text api like openAi’s whisper and google cloud speech to text. After i record this blob i convert it to a base64 string as per the api documentation. When i record a blob from the Media Recorder API, it works on all browsers except for safari. I then swapped to another library called RecorderRTC and then that worked on all browsers, but the Google speech to text now never works on any browser. I get a bad request. I can only assume this is a encoding issue but if it is WAV from both libraries what could the issue be? I pass the base 64 string from my react front end to a cloud function. This cloud function is written in TS.
I tried printing out the base64 string from both and there did seem to be a pretty big difference but im not sure if this is relevant. The media recorder api returned a base64 string that looked normal, the recorderRTC one has a couple characters and then like 200 A’s and then characters again. This didn’t seem to be an issue for the Open ai api but it did for the google. So i dont know how to proceed. Also important to note, media recorder api did give a base64 string and i sent that to google cloud but even that failed. Not sure how to proceed.
Rida Darwish is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1