I am working on a project where I need to extract text from frames of an Instagram Reels video. I used the yt-dlp
to download the video, extracted frames using ffmpeg
, and attempted to read the text from the frames using Tesseract OCR.
Here’s the workflow I followed:
Extracted the frames
from the video using ffmpeg
at 5 frames per second, and Used Tesseract OCR to extract text from the preprocessed frames.
However, I’m unable to extract text from the frames. Below is the code snippet I’m using:
from PIL import Image
import pytesseract
import os
# Path to the tesseract executable
pytesseract.pytesseract.tesseract_cmd = r'C:Program FilesTesseract-OCRtesseract.exe'
# Path to the image file
image_path = r"Insta Reelsframe_0077.png"
try:
if not os.path.exists(image_path):
raise FileNotFoundError(f"The file {image_path} does not exist.")
# Open the image file
image = Image.open(image_path)
# Perform OCR on the image
text = pytesseract.image_to_string(image, lang='eng')
if text.strip():
print("Extracted text:")
print(text)
else:
print("No text was extracted from the image.")
except FileNotFoundError as e:
print(f"Error: {e}")
except Exception as e:
print(f"An error occurred: {e}")
The problem is that the extracted text is either incomplete or not detected at all.
What preprocessing steps should I apply to these frames to improve the accuracy of Tesseract OCR?
Function to extract frames from video
def extract_frames(video_path, output_folder):
if not os.path.isfile(video_path):
raise FileNotFoundError(f"Video file not found at {video_path}")
temp_folder = os.path.join(output_folder, 'temp_frames')
os.makedirs(temp_folder, exist_ok=True)
# Extract frames at a higher rate
command = [
'ffmpeg', '-i', video_path,
'-vf', 'fps=5', # Extract five frames per second for better coverage
os.path.join(temp_folder, 'frame_%04d.png')
]
subprocess.run(command, check=True)