I am creating highlighted word level subtitles which basically is in sync with the speaker’s voice. i have attached the code, the json and the output srt
this is the code i am having issues with, what i want is to create a srt file with help of the json i have got which contains the start and end timestaps. the output srt do not have accurate timestamps.
@app.get("/convert")
def convert():
def format_time(seconds):
hours = int(seconds // 3600)
minutes = int((seconds % 3600) // 60)
seconds = int(seconds % 60)
milliseconds = int((seconds - int(seconds)) * 1000) # Convert fractional part to milliseconds
return f"{hours:02d}:{minutes:02d}:{seconds:02d},{milliseconds:03d}"
def convert_to_srt(json_data):
srt_content = ""
segment_number = 1
for segment in json_data:
sentence = segment['text']
segment_start_time = segment['start']
for word in segment['words']:
if 'start' in word and 'end' in word:
word_start_time = segment_start_time + word['start']
word_end_time = segment_start_time + word['end']
start_time = format_time(word_start_time)
end_time = format_time(word_end_time)
highlighted_word = f"<u>{word['word']}</u>" if word['word'] != "." else word['word']
highlighted_sentence = sentence.replace(word['word'], highlighted_word, 1)
srt_content += f"{segment_number}n{start_time} --> {end_time}n{highlighted_sentence}nn"
segment_number += 1
return srt_content
this is the json
[{"start": 16.556, "end": 19.177, "text": " Everyone, please think of your biggest personal goal.", "words": [{"word": "Everyone,", "start": 16.556, "end": 16.916, "score": 0.804}, {"word": "please", "start": 16.996, "end": 17.316, "score": 0.707}, {"word": "think", "start": 17.396, "end": 17.656, "score": 0.768}, {"word": "of", "start": 17.736, "end": 17.816, "score": 0.692}, {"word": "your", "start": 17.876, "end": 18.016, "score": 0.641}, {"word": "biggest", "start": 18.057, "end": 18.357, "score": 0.898}, {"word": "personal", "start": 18.477, "end": 18.897, "score": 0.795}, {"word": "goal.", "start": 18.937, "end": 19.177, "score": 0.834}]}, {"start": 19.197, "end": 20.857, "text": "Okay, for real.", "words": [{"word": "Okay,", "start": 19.197, "end": 20.437, "score": 0.741}, {"word": "for", "start": 20.497, "end": 20.617, "score": 0.761}, {"word": "real.", "start": 20.637, "end": 20.857, "score": 0.827}]}, {"start": 21.218, "end": 22.718, "text": "Take a second, you gotta feel this to learn
and the output srt
1
00:00:33,000 --> 00:00:33,000
<u>Everyone,</u> please think of your biggest personal goal.
2
00:00:33,000 --> 00:00:33,000
Everyone, <u>please</u> think of your biggest personal goal.
3
00:00:33,000 --> 00:00:34,000
Everyone, please <u>think</u> of your biggest personal goal.
4
00:00:34,000 --> 00:00:34,000
according to the logic used in convert_to_srt function i believe it should give something like this… which is what i want
1
00:00:16,556 --> 00:00:16,916
<u>Everyone,</u> please think of your biggest personal goal.
2
00:00:16,996 --> 00:00:17,316
Everyone, <u>please</u> think of your biggest personal goal.
but the output i am getting is completly inaccurate