I am working on a project where I have retrieved a bunch of highlight videos for a basketball game. These highlights include every basket made, assists, blocks etc. Because these highlights in total are like 15-20min long, I want to implement some algorithm to recognize the most important ones. One of my ideas was to somehow use audio analysis, because the more exciting the highlight, the higher the chance that crowd or commentator will make higher noise.
After investigating I found that this is not so easy, but I implemented 2 functions to measure 1) loudness and 2) peak_Db for each video highlgiht.
- Loudness:
import soundfile as sf
import pyloudnorm as pyln
def extract_video_loudness(self, audio_filename):
#get video loudness and store in dict
data, rate = sf.read(audio_filename) # load audio (with shape (samples, channels))
meter = pyln.Meter(rate) # create BS.1770 meter
loudness = meter.integrated_loudness(data) # measure loudness
return loudness
- peak_Db:
def extract_audio_peak(self, audio_filename):
# Open the audio file
with wave.open(audio_filename, 'r') as audio:
# Extract the raw audio data
raw_data = audio.readframes(audio.getnframes())
# Convert the raw audio data to a list of integers
print(raw_data)
print(len(raw_data))
print(audio.getnframes())
samples = struct.unpack('{n}h'.format(n=audio.getnframes()*2), raw_data)
# Find the peak sample
peak = max(samples)
# Calculate the reference value based on the bit depth of the audio file
reference_value = 2**(audio.getsampwidth() * 8 - 1)
# Calculate the peak value in dBFS, using the maximum possible sample value as the reference value
peak_dB = 20 * math.log10(peak / reference_value)
return peak_dB
I am using both of these return values to then sort highlights by importance, where the loudest and highest peak_Db means more important highlight. However, even though technically this code works, after testing the results I don’t really see that they would work as I would want to. E.g., some videos, where crowd in the arena plus the commentator goes crazy because of impressive moment, are measured as not so loud and also peak_Db seems to be about average compared to other videos. Meanwhile some standard play has very high loudness and peak_Db, although when checking the highlight I can see that it is pretty standard play and crowd nor the commentator aren’t making any big noise. Thus it seems that I can’t really rely on this code to sort out most important highlights.
So my question is – could you guys help me with any ideas what I could use or maybe how to tweak my already existing code to get more precise results?
Also I had an idea about using some AI stuff, but I have never been coding anything with AI and also I could not find anything on internet what could help me. I will appreciate nay ideas or information!