Python timecodes alignment adjustments by detecting gap

I have a script I use to compare 2 CSV files. The files are created by PySceneDetect. Each file is run to list the scene changes within the video. Since the videos are of the same thing just with slightly altered video, I then compare them to find the best offset. I use this number to adjust the audio or video track when muxing them together to form a file with the video from one file & the audio from the other file.
Here is an example of some of the files I’m comparing

Timecode List:,00:00:02.042,00:00:09.000,00:00:11.958,00:00:13.917,00:00:15.875,00:00:18.833,00:00:22.333,00:00:25.292,00:00:30.250,00:00:32.458,00:00:37.333,00:00:40.917,00:00:45.875,00:00:50.833,00:00:55.792,00:00:57.417,00:01:00.375,00:01:04.875,00:01:08.917,00:01:16.125,00:01:21.000,00:01:23.958,00:01:24.875,00:01:28.875
Scene Number,Start Frame,Start Timecode,Start Time (seconds),End Frame,End Timecode,End Time (seconds),Length (frames),Length (timecode),Length (seconds)
1,1,00:00:00.000,0.000,49,00:00:02.042,2.042,49,00:00:02.042,2.042
2,50,00:00:02.042,2.042,216,00:00:09.000,9.000,167,00:00:06.958,6.958
3,217,00:00:09.000,9.000,287,00:00:11.958,11.958,71,00:00:02.958,2.958
4,288,00:00:11.958,11.958,334,00:00:13.917,13.917,47,00:00:01.958,1.958
5,335,00:00:13.917,13.917,381,00:00:15.875,15.875,47,00:00:01.958,1.958
6,382,00:00:15.875,15.875,452,00:00:18.833,18.833,71,00:00:02.958,2.958
7,453,00:00:18.833,18.833,536,00:00:22.333,22.333,84,00:00:03.500,3.500
8,537,00:00:22.333,22.333,607,00:00:25.292,25.292,71,00:00:02.958,2.958
9,608,00:00:25.292,25.292,726,00:00:30.250,30.250,119,00:00:04.958,4.958
10,727,00:00:30.250,30.250,779,00:00:32.458,32.458,53,00:00:02.208,2.208
11,780,00:00:32.458,32.458,896,00:00:37.333,37.333,117,00:00:04.875,4.875
12,897,00:00:37.333,37.333,982,00:00:40.917,40.917,86,00:00:03.583,3.583
13,983,00:00:40.917,40.917,1101,00:00:45.875,45.875,119,00:00:04.958,4.958
14,1102,00:00:45.875,45.875,1220,00:00:50.833,50.833,119,00:00:04.958,4.958
15,1221,00:00:50.833,50.833,1339,00:00:55.792,55.792,119,00:00:04.958,4.958
16,1340,00:00:55.792,55.792,1378,00:00:57.417,57.417,39,00:00:01.625,1.625
17,1379,00:00:57.417,57.417,1449,00:01:00.375,60.375,71,00:00:02.958,2.958
18,1450,00:01:00.375,60.375,1557,00:01:04.875,64.875,108,00:00:04.500,4.500
19,1558,00:01:04.875,64.875,1654,00:01:08.917,68.917,97,00:00:04.042,4.042
20,1655,00:01:08.917,68.917,1827,00:01:16.125,76.125,173,00:00:07.208,7.208
21,1828,00:01:16.125,76.125,1944,00:01:21.000,81.000,117,00:00:04.875,4.875
22,1945,00:01:21.000,81.000,2015,00:01:23.958,83.958,71,00:00:02.958,2.958
23,2016,00:01:23.958,83.958,2037,00:01:24.875,84.875,22,00:00:00.917,0.917
24,2038,00:01:24.875,84.875,2133,00:01:28.875,88.875,96,00:00:04.000,4.000
25,2134,00:01:28.875,88.875,2171,00:01:30.458,90.458,38,00:00:01.583,1.583

&

Timecode List:,00:00:02.000,00:00:08.958,00:00:11.917,00:00:13.875,00:00:15.833,00:00:18.792,00:00:22.292,00:00:25.250,00:00:30.208,00:00:32.417,00:00:37.292,00:00:40.875,00:00:45.833,00:00:50.792,00:00:55.750,00:00:57.375,00:01:00.333,00:01:04.833,00:01:08.875,00:01:16.083,00:01:20.958,00:01:23.917,00:01:24.833,00:01:28.833
Scene Number,Start Frame,Start Timecode,Start Time (seconds),End Frame,End Timecode,End Time (seconds),Length (frames),Length (timecode),Length (seconds)
1,1,00:00:00.000,0.000,48,00:00:02.000,2.000,48,00:00:02.000,2.000
2,49,00:00:02.000,2.000,215,00:00:08.958,8.958,167,00:00:06.958,6.958
3,216,00:00:08.958,8.958,286,00:00:11.917,11.917,71,00:00:02.958,2.958
4,287,00:00:11.917,11.917,333,00:00:13.875,13.875,47,00:00:01.958,1.958
5,334,00:00:13.875,13.875,380,00:00:15.833,15.833,47,00:00:01.958,1.958
6,381,00:00:15.833,15.833,451,00:00:18.792,18.792,71,00:00:02.958,2.958
7,452,00:00:18.792,18.792,535,00:00:22.292,22.292,84,00:00:03.500,3.500
8,536,00:00:22.292,22.292,606,00:00:25.250,25.250,71,00:00:02.958,2.958
9,607,00:00:25.250,25.250,725,00:00:30.208,30.208,119,00:00:04.958,4.958
10,726,00:00:30.208,30.208,778,00:00:32.417,32.417,53,00:00:02.208,2.208
11,779,00:00:32.417,32.417,895,00:00:37.292,37.292,117,00:00:04.875,4.875
12,896,00:00:37.292,37.292,981,00:00:40.875,40.875,86,00:00:03.583,3.583
13,982,00:00:40.875,40.875,1100,00:00:45.833,45.833,119,00:00:04.958,4.958
14,1101,00:00:45.833,45.833,1219,00:00:50.792,50.792,119,00:00:04.958,4.958
15,1220,00:00:50.792,50.792,1338,00:00:55.750,55.750,119,00:00:04.958,4.958
16,1339,00:00:55.750,55.750,1377,00:00:57.375,57.375,39,00:00:01.625,1.625
17,1378,00:00:57.375,57.375,1448,00:01:00.333,60.333,71,00:00:02.958,2.958
18,1449,00:01:00.333,60.333,1556,00:01:04.833,64.833,108,00:00:04.500,4.500
19,1557,00:01:04.833,64.833,1653,00:01:08.875,68.875,97,00:00:04.042,4.042
20,1654,00:01:08.875,68.875,1826,00:01:16.083,76.083,173,00:00:07.208,7.208
21,1827,00:01:16.083,76.083,1943,00:01:20.958,80.958,117,00:00:04.875,4.875
22,1944,00:01:20.958,80.958,2014,00:01:23.917,83.917,71,00:00:02.958,2.958
23,2015,00:01:23.917,83.917,2036,00:01:24.833,84.833,22,00:00:00.917,0.917
24,2037,00:01:24.833,84.833,2132,00:01:28.833,88.833,96,00:00:04.000,4.000
25,2133,00:01:28.833,88.833,2170,00:01:30.417,90.417,38,00:00:01.583,1.583

I run the following Python script to compare them

import csv
import sys
from collections import Counter

def extract_timecodes(file_path):
    timecodes = []
    with open(file_path, 'r') as csvfile, open(file_path) as file:
        reader = csv.reader(csvfile)
        next(reader)  # Skip the header row
        next(reader)  # Skip the second header row
        for row in reader:
            timecodes.append(row[2])  # Start Timecode in the 3rd column
    return timecodes

def timecode_to_seconds(timecode):
    h, m, s = map(float, timecode.split(':'))
    return h * 3600 + m * 60 + s

def compare_timecodes(tc1, tc2):
    return abs(tc1 - tc2)

def main(file1, file2):
    timecodes1 = extract_timecodes(file1)
    timecodes2 = extract_timecodes(file2)

    # Debugging: Print extracted timecodes
    print("Timecodes from file 1:")
    print(timecodes1)
    print("Timecodes from file 2:")
    print(timecodes2)

    differences = []
    for i in range(1, min(len(timecodes1), len(timecodes2))):  # Ignoring the first value which should always be 0
        tc1 = timecode_to_seconds(timecodes1[i])
        tc2 = timecode_to_seconds(timecodes2[i])
        diff = compare_timecodes(tc1, tc2)
        differences.append(round(diff, 3))

    # Debugging: Print calculated differences
    print("Calculated differences:")
    print(differences)
    
    mean_diff = sum(differences) / len(differences) if differences else 0
    mode_diff = Counter(differences).most_common(1)[0] if differences else (0, 0)
    freq_diffs = Counter(differences).most_common()

    # Remove duplicates and sort by frequency
    freq_diffs = list(dict(freq_diffs).items())[:5]

    # Output the results
    print("Differences:")
    for i, diff in enumerate(differences, 2):  # Starting from 2 to ignore beginning 0
        print(f"{i:02}: {diff:.3f}")

    print(" - - Average - - ")
    print(f"Mean: {mean_diff:.5f}")
    print(f"Mode: {mode_diff[0]:.3f}")

    print(" - - Frequency - - ")
    for val, freq in freq_diffs:
        print(f"{val:.3f}({freq})")

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: python compare_scenes.py <file1> <file2>")
    else:
        main(sys.argv[1], sys.argv[2])

Most of the time this works great, but sometimes there’s an unexpected scene change detected in one file but not in the other. This causes the comparison to show a ton of random numbers because the values no longer align.
An example:

Scene File1 File2 Offset Length1 Length2
1 00:00:00.000 00:00:00.000 0.000 0.000 0.000
2 00:00:02.042 00:00:02.000 0.042 2.042 2.000
3 00:00:09.000 00:00:08.958 0.042 6.958 6.958
4 00:00:11.958 00:00:11.917 0.041 2.958 2.959
5 00:00:13.917 00:00:13.875 0.042 1.959 1.958
6 00:00:15.875 00:00:15.833 0.042 1.958 1.958
7 00:00:18.833 00:00:18.792 0.041 2.958 2.959
8 00:00:22.333 00:00:22.292 0.041 3.500 3.500
9 00:00:25.292 00:00:25.250 0.042 2.959 2.958
10 00:00:30.250 00:00:30.208 0.042 4.958 4.958

But Lets say that it detected another scene change in File2 that wasn’t in File1
| Scene | File1 | File2 | Offset | – | Length1 | Length2 |
| :—: | :—: | :—: | :—-: | :-: | :—–: | :—–: |
| 1 | 00:00:00.000 | 00:00:00.000 | 0.000 | | 0.000 | 0.000 |
| 2 | 00:00:02.042 | 00:00:02.000 | 0.042 | | 2.042 | 2.000 |
| 3 | 00:00:09.000 | 00:00:08.958 | 0.042 | | 6.958 | 6.958 |
| 4 | 00:00:11.958 | 00:00:09.937 | 2.021 | | 2.958 | 0.979 |
| 5 | 00:00:13.917 | 00:00:11.917 | 2.000 | | 1.959 | 1.980 |
| 6 | 00:00:15.875 | 00:00:13.875 | 2.000 | | 1.958 | 1.958 |
| 7 | 00:00:18.833 | 00:00:15.833 | 3.000 | | 2.958 | 1.958 |
| 8 | 00:00:22.333 | 00:00:18.792 | 3.541 | | 3.500 | 2.959 |
| 9 | 00:00:25.292 | 00:00:22.292 | 3.000 | | 2.959 | 3.500 |
| 10 | 00:00:30.250 | 00:00:25.250 | 5.000 | | 4.958 | 2.958 |

After the unexpected one is detected everything after is useless because they no longer line up.

I’m trying to find a way to have it detect the length of the scenes &, if they don’t align, remove or ignore the line from the one that has an extra one, or rather combine the 2 into a single one depending on how you look at it.

I have tried asking ChatGPT to help me make that adjustment, but every attempt ends in failure.

What Logic can I use to have it compare the time between scene 1 & 2 & adjust it if they do not add up, or at least get close?

I was thinking 100ms should be close enough to consider it accurate since in good files I have seen no more than 10ms difference but in bad files they are always at least 400ms difference.

Is this possible to do within Python or will I need another tool?

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật