I want to build an app which dose speaker recognition first and then speech recognition for live stream audio data from microphone and give response in real time in ruby on rails where data comes from front end, I want to know what APIs are available for this or any blogs or post which achieved similar goal, since I could not found any helping material for this…
First I tried google cloud speech to text api and created an app based on action cable, for me it transcribes the first chunk of audio correctly but for rest there is no result from api, after digging it i found only first chunk contains the header information so need to append header to each chunk, I tried but this did not worked
further more since I have to do speaker recognition as well but i guess google doesn’t provide such api, azure speaker recognition could be an option but not sure if it support live stream audio, also the audio data should be in a format which is supported by google cloud speech to text api as well azure speaker recognition, where azure api need data in uncompressed format where google cloud api need data in compressed format, so not sure whats the right approach here
Zeeshan Haider is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.