So I had the brilliant idea of iterating a whisper model multiple times and comparing the results for best accuracy. For testing purposes I went with the base model, since it’s faster and may show better results since its WER is higher. Now the problem is, I have no clue how to compare those results (I want to compare the timestamps as well as the probability, and either output the mode or the one with the highest probability, whichever would give the better result).
import whisper
from whisper.utils import get_writer
from moviepy.editor import VideoFileClip
import os
def extract_audio_from_video(video_path, output_audio_path, volume_increase=2.0):
""" Extracts audio from a video file, increases the volume, and saves it as an MP3 file. """
# Load the video file
video_clip = VideoFileClip(video_path)
# Extract audio from the video clip
audio_clip = video_clip.audio
# Increase the volume of the audio clip
audio_clip = audio_clip.volumex(volume_increase)
# Write the extracted and volume-increased audio to an MP3 file
audio_clip.write_audiofile(output_audio_path, codec='mp3')
# Close the clips to free up resources
video_clip.close()
audio_clip.close()
# Path to the video file
video_path = (Assume a correct video path is added here)
# Path where the extracted audio will be saved
output_audio_path = "Video_Audio.mp3"
# Extract audio from video, increase the volume, and save as MP3
extract_audio_from_video(video_path, output_audio_path)
model = whisper.load_model('base')
audio = whisper.load_audio("/content/Video_Audio.mp3")
result1 = whisper.transcribe(model, audio, word_timestamps=True)
result2 = whisper.transcribe(model, audio, word_timestamps=True)
result3 = whisper.transcribe(model, audio, word_timestamps=True)
result4 = whisper.transcribe(model, audio, word_timestamps=True)
result5 = whisper.transcribe(model, audio, word_timestamps=True)
This outputted correctly. I tried many ways to compare the 5 and output the most repeated result (the mode if you’re familiar with mathematics) and failed miserably. Note that the mp3 file is one hour long, if it helps any.
Amr Reda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.