Thiết kế website giá rẻ

Question

I have a script I use to compare 2 CSV files. The files are created by PySceneDetect. Each file is run to list the scene changes within the video. Since the videos are of the same thing just with slightly altered video, I then compare them to find the best offset. I use this number to adjust the audio or video track when muxing them together to form a file with the video from one file & the audio from the other file.
Here is an example of some of the files I’m comparing

Timecode List:,00:00:02.042,00:00:09.000,00:00:11.958,00:00:13.917,00:00:15.875,00:00:18.833,00:00:22.333,00:00:25.292,00:00:30.250,00:00:32.458,00:00:37.333,00:00:40.917,00:00:45.875,00:00:50.833,00:00:55.792,00:00:57.417,00:01:00.375,00:01:04.875,00:01:08.917,00:01:16.125,00:01:21.000,00:01:23.958,00:01:24.875,00:01:28.875
Scene Number,Start Frame,Start Timecode,Start Time (seconds),End Frame,End Timecode,End Time (seconds),Length (frames),Length (timecode),Length (seconds)
1,1,00:00:00.000,0.000,49,00:00:02.042,2.042,49,00:00:02.042,2.042
2,50,00:00:02.042,2.042,216,00:00:09.000,9.000,167,00:00:06.958,6.958
3,217,00:00:09.000,9.000,287,00:00:11.958,11.958,71,00:00:02.958,2.958
4,288,00:00:11.958,11.958,334,00:00:13.917,13.917,47,00:00:01.958,1.958
5,335,00:00:13.917,13.917,381,00:00:15.875,15.875,47,00:00:01.958,1.958
6,382,00:00:15.875,15.875,452,00:00:18.833,18.833,71,00:00:02.958,2.958
7,453,00:00:18.833,18.833,536,00:00:22.333,22.333,84,00:00:03.500,3.500
8,537,00:00:22.333,22.333,607,00:00:25.292,25.292,71,00:00:02.958,2.958
9,608,00:00:25.292,25.292,726,00:00:30.250,30.250,119,00:00:04.958,4.958
10,727,00:00:30.250,30.250,779,00:00:32.458,32.458,53,00:00:02.208,2.208
11,780,00:00:32.458,32.458,896,00:00:37.333,37.333,117,00:00:04.875,4.875
12,897,00:00:37.333,37.333,982,00:00:40.917,40.917,86,00:00:03.583,3.583
13,983,00:00:40.917,40.917,1101,00:00:45.875,45.875,119,00:00:04.958,4.958
14,1102,00:00:45.875,45.875,1220,00:00:50.833,50.833,119,00:00:04.958,4.958
15,1221,00:00:50.833,50.833,1339,00:00:55.792,55.792,119,00:00:04.958,4.958
16,1340,00:00:55.792,55.792,1378,00:00:57.417,57.417,39,00:00:01.625,1.625
17,1379,00:00:57.417,57.417,1449,00:01:00.375,60.375,71,00:00:02.958,2.958
18,1450,00:01:00.375,60.375,1557,00:01:04.875,64.875,108,00:00:04.500,4.500
19,1558,00:01:04.875,64.875,1654,00:01:08.917,68.917,97,00:00:04.042,4.042
20,1655,00:01:08.917,68.917,1827,00:01:16.125,76.125,173,00:00:07.208,7.208
21,1828,00:01:16.125,76.125,1944,00:01:21.000,81.000,117,00:00:04.875,4.875
22,1945,00:01:21.000,81.000,2015,00:01:23.958,83.958,71,00:00:02.958,2.958
23,2016,00:01:23.958,83.958,2037,00:01:24.875,84.875,22,00:00:00.917,0.917
24,2038,00:01:24.875,84.875,2133,00:01:28.875,88.875,96,00:00:04.000,4.000
25,2134,00:01:28.875,88.875,2171,00:01:30.458,90.458,38,00:00:01.583,1.583

&

Timecode List:,00:00:02.000,00:00:08.958,00:00:11.917,00:00:13.875,00:00:15.833,00:00:18.792,00:00:22.292,00:00:25.250,00:00:30.208,00:00:32.417,00:00:37.292,00:00:40.875,00:00:45.833,00:00:50.792,00:00:55.750,00:00:57.375,00:01:00.333,00:01:04.833,00:01:08.875,00:01:16.083,00:01:20.958,00:01:23.917,00:01:24.833,00:01:28.833
Scene Number,Start Frame,Start Timecode,Start Time (seconds),End Frame,End Timecode,End Time (seconds),Length (frames),Length (timecode),Length (seconds)
1,1,00:00:00.000,0.000,48,00:00:02.000,2.000,48,00:00:02.000,2.000
2,49,00:00:02.000,2.000,215,00:00:08.958,8.958,167,00:00:06.958,6.958
3,216,00:00:08.958,8.958,286,00:00:11.917,11.917,71,00:00:02.958,2.958
4,287,00:00:11.917,11.917,333,00:00:13.875,13.875,47,00:00:01.958,1.958
5,334,00:00:13.875,13.875,380,00:00:15.833,15.833,47,00:00:01.958,1.958
6,381,00:00:15.833,15.833,451,00:00:18.792,18.792,71,00:00:02.958,2.958
7,452,00:00:18.792,18.792,535,00:00:22.292,22.292,84,00:00:03.500,3.500
8,536,00:00:22.292,22.292,606,00:00:25.250,25.250,71,00:00:02.958,2.958
9,607,00:00:25.250,25.250,725,00:00:30.208,30.208,119,00:00:04.958,4.958
10,726,00:00:30.208,30.208,778,00:00:32.417,32.417,53,00:00:02.208,2.208
11,779,00:00:32.417,32.417,895,00:00:37.292,37.292,117,00:00:04.875,4.875
12,896,00:00:37.292,37.292,981,00:00:40.875,40.875,86,00:00:03.583,3.583
13,982,00:00:40.875,40.875,1100,00:00:45.833,45.833,119,00:00:04.958,4.958
14,1101,00:00:45.833,45.833,1219,00:00:50.792,50.792,119,00:00:04.958,4.958
15,1220,00:00:50.792,50.792,1338,00:00:55.750,55.750,119,00:00:04.958,4.958
16,1339,00:00:55.750,55.750,1377,00:00:57.375,57.375,39,00:00:01.625,1.625
17,1378,00:00:57.375,57.375,1448,00:01:00.333,60.333,71,00:00:02.958,2.958
18,1449,00:01:00.333,60.333,1556,00:01:04.833,64.833,108,00:00:04.500,4.500
19,1557,00:01:04.833,64.833,1653,00:01:08.875,68.875,97,00:00:04.042,4.042
20,1654,00:01:08.875,68.875,1826,00:01:16.083,76.083,173,00:00:07.208,7.208
21,1827,00:01:16.083,76.083,1943,00:01:20.958,80.958,117,00:00:04.875,4.875
22,1944,00:01:20.958,80.958,2014,00:01:23.917,83.917,71,00:00:02.958,2.958
23,2015,00:01:23.917,83.917,2036,00:01:24.833,84.833,22,00:00:00.917,0.917
24,2037,00:01:24.833,84.833,2132,00:01:28.833,88.833,96,00:00:04.000,4.000
25,2133,00:01:28.833,88.833,2170,00:01:30.417,90.417,38,00:00:01.583,1.583

I run the following Python script to compare them

import csv
import sys
from collections import Counter

def extract_timecodes(file_path):
    timecodes = []
    with open(file_path, 'r') as csvfile, open(file_path) as file:
        reader = csv.reader(csvfile)
        next(reader)  # Skip the header row
        next(reader)  # Skip the second header row
        for row in reader:
            timecodes.append(row[2])  # Start Timecode in the 3rd column
    return timecodes

def timecode_to_seconds(timecode):
    h, m, s = map(float, timecode.split(':'))
    return h * 3600 + m * 60 + s

def compare_timecodes(tc1, tc2):
    return abs(tc1 - tc2)

def main(file1, file2):
    timecodes1 = extract_timecodes(file1)
    timecodes2 = extract_timecodes(file2)

    # Debugging: Print extracted timecodes
    print("Timecodes from file 1:")
    print(timecodes1)
    print("Timecodes from file 2:")
    print(timecodes2)

    differences = []
    for i in range(1, min(len(timecodes1), len(timecodes2))):  # Ignoring the first value which should always be 0
        tc1 = timecode_to_seconds(timecodes1[i])
        tc2 = timecode_to_seconds(timecodes2[i])
        diff = compare_timecodes(tc1, tc2)
        differences.append(round(diff, 3))

    # Debugging: Print calculated differences
    print("Calculated differences:")
    print(differences)
    
    mean_diff = sum(differences) / len(differences) if differences else 0
    mode_diff = Counter(differences).most_common(1)[0] if differences else (0, 0)
    freq_diffs = Counter(differences).most_common()

    # Remove duplicates and sort by frequency
    freq_diffs = list(dict(freq_diffs).items())[:5]

    # Output the results
    print("Differences:")
    for i, diff in enumerate(differences, 2):  # Starting from 2 to ignore beginning 0
        print(f"{i:02}: {diff:.3f}")

    print(" - - Average - - ")
    print(f"Mean: {mean_diff:.5f}")
    print(f"Mode: {mode_diff[0]:.3f}")

    print(" - - Frequency - - ")
    for val, freq in freq_diffs:
        print(f"{val:.3f}({freq})")

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: python compare_scenes.py <file1> <file2>")
    else:
        main(sys.argv[1], sys.argv[2])

Most of the time this works great, but sometimes there’s an unexpected scene change detected in one file but not in the other. This causes the comparison to show a ton of random numbers because the values no longer align.
An example:

Scene	File1	File2	Offset	Length1	Length2
1	00:00:00.000	00:00:00.000	0.000	0.000	0.000
2	00:00:02.042	00:00:02.000	0.042	2.042	2.000
3	00:00:09.000	00:00:08.958	0.042	6.958	6.958
4	00:00:11.958	00:00:11.917	0.041	2.958	2.959
5	00:00:13.917	00:00:13.875	0.042	1.959	1.958
6	00:00:15.875	00:00:15.833	0.042	1.958	1.958
7	00:00:18.833	00:00:18.792	0.041	2.958	2.959
8	00:00:22.333	00:00:22.292	0.041	3.500	3.500
9	00:00:25.292	00:00:25.250	0.042	2.959	2.958
10	00:00:30.250	00:00:30.208	0.042	4.958	4.958

But Lets say that it detected another scene change in File2 that wasn’t in File1
| Scene | File1 | File2 | Offset | – | Length1 | Length2 |
| :—: | :—: | :—: | :—-: | :-: | :—–: | :—–: |
| 1 | 00:00:00.000 | 00:00:00.000 | 0.000 | | 0.000 | 0.000 |
| 2 | 00:00:02.042 | 00:00:02.000 | 0.042 | | 2.042 | 2.000 |
| 3 | 00:00:09.000 | 00:00:08.958 | 0.042 | | 6.958 | 6.958 |
| 4 | 00:00:11.958 | 00:00:09.937 | 2.021 | | 2.958 | 0.979 |
| 5 | 00:00:13.917 | 00:00:11.917 | 2.000 | | 1.959 | 1.980 |
| 6 | 00:00:15.875 | 00:00:13.875 | 2.000 | | 1.958 | 1.958 |
| 7 | 00:00:18.833 | 00:00:15.833 | 3.000 | | 2.958 | 1.958 |
| 8 | 00:00:22.333 | 00:00:18.792 | 3.541 | | 3.500 | 2.959 |
| 9 | 00:00:25.292 | 00:00:22.292 | 3.000 | | 2.959 | 3.500 |
| 10 | 00:00:30.250 | 00:00:25.250 | 5.000 | | 4.958 | 2.958 |

After the unexpected one is detected everything after is useless because they no longer line up.

I’m trying to find a way to have it detect the length of the scenes &, if they don’t align, remove or ignore the line from the one that has an extra one, or rather combine the 2 into a single one depending on how you look at it.

I have tried asking ChatGPT to help me make that adjustment, but every attempt ends in failure.

What Logic can I use to have it compare the time between scene 1 & 2 & adjust it if they do not add up, or at least get close?

I was thinking 100ms should be close enough to consider it accurate since in good files I have seen no more than 10ms difference but in bad files they are always at least 400ms difference.

Is this possible to do within Python or will I need another tool?

Thiết kế website giá rẻ

Danh mục

Python timecodes alignment adjustments by detecting gap

What Logic can I use to have it compare the time between scene 1 & 2 & adjust it if they do not add up, or at least get close?