I am using google speech to text service. Since the audios In my project are longer then 60 sec I use longRunningRecognize function. I run my tests on mp3 and wav format song files expecting to get all the lyrics of songs. The results I get are partial often less then 30% of lyrics.
From optimize audio files I tried to do some optimizations with file. Here is how my files ffprobe
looks
Input #0, flac, from 'h.flac':
Metadata:
comment :
title : Charlotte Cardin-dirty dirty (lyrics)
encoded_by :
encoder : Lavf60.16.100
Duration: 00:03:20.39, start: 0.000000, bitrate: 165 kb/s
Stream #0:0: Audio: flac, 16000 Hz, 1 channels (FR), s16
After getting lyrics from results I have ~300 symbols while the original lyrics has 1400 symbols.
And here are my configs sent to api
{
encoding: 'FLAC',
sampleRateHertz: 16000,
languageCode: 'en-US',
enableAutomaticPunctuation: true,
}
Is there any other optimizations I can do to get all the lyrics