I’m testing out the audio classification using the sample Android app here:
https://developers.google.com/codelabs/tflite-audio-classification-basic-android
However, when I test the audio recording, it doesn’t seem to classify the audio properly as conversation (even though there is a class for Conversation).
When I am talking or if I have one person talking with me (conversation), I see it always as Speech and sometimes the score is very low even though there is a conversation that I recorded while running the app. Also, it seems to give other audio classifications with a low score like maybe “burping” or “breathing”, but have no relation to the audio that I recorded.
Does this mean that the Yamnet model is not very accurate, I see that it was trained using the AudioSet which contains YouTube videos of the various different audio classes (https://research.google.com/audioset/index.html).
Thanks for any help.
Alvin Chin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.