Here is the relevant code:
`def transcribe_audio_gemini_ai(file_path):
try:
client = speech.SpeechClient()
# # Read the audio file
with io.open(file_path, "rb") as audio_file:
content = audio_file.read()
# Configure audio settings
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
enable_automatic_punctuation = True,
# encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=48000,
language_code="en-IN",
speech_contexts=[speech.SpeechContext(
phrases=["Hello, lets do one thing","Staturio tile", "Anti Skid", "Highlighter grey dark"],
boost=15.0)]
)
response = client.recognize(config=config, audio=audio)
if not response.results:
print('--->>', response)
print("No transcription results gtom gemini")
# Print the results
for result in response.results:
print("Google Transcript: {}".format(result.alternatives[0].transcript))
except Exception as e:
print('Error with chirp audio transcription',e)`
I’m using v2 of the API, and the transcription results are not as expected. I’ve commented out the speech_contexts part here, but even when it’s included, it doesn’t seem to improve the accuracy.
Has anyone experienced similar issues or have any advice on how to correctly implement the model adaptation to get better transcription results?
Thank you for your help!
I’m trying to implement model adaptation (speech adaptation) in my transcription system using Chirp. The idea is to improve transcription accuracy by guiding the model with specific phrases and terms. However, the adaptation part is not working as expected.
Yash Kumawat is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.