I’m trying to build a Python script that can detect text that flashes on the screen for a very short period (around 0.2 seconds). I’m using mss for screen capturing and pytesseract for OCR. Below is the code I’m working with:
python
Copy code
import cv2
import pytesseract
import numpy as np
from mss import mss
import time
import threading
# Configure pytesseract path
pytesseract.pytesseract_cmd = "/opt/homebrew/bin/tesseract"
# Define the full-screen capture region
sct = mss()
monitor = sct.monitors[1] # Full screen (adjust for multiple monitors)
# Initialize variables
last_detected_text = ""
detected_text_lock = threading.Lock()
# Function to capture the screen and process OCR
def capture_and_process_text():
global last_detected_text
while True:
start_time = time.time()
# Capture the screen
screenshot = np.array(sct.grab(monitor))
# Skip grayscale or thresholding for speed
text = pytesseract.image_to_string(screenshot, lang="eng").strip()
# Normalize the text to reduce noise
normalized_text = " ".join(text.split())
# Only display new and non-empty text
with detected_text_lock:
if normalized_text and normalized_text != last_detected_text:
print(f"Detected Text: {normalized_text}")
last_detected_text = normalized_text
# Dynamically adjust loop timing
end_time = time.time()
print(f"Frame processed in {end_time - start_time:.5f} seconds")
# Run the optimized text capture loop
print("Starting full-screen text capture...")
capture_thread = threading.Thread(target=capture_and_process_text)
capture_thread.start()
try:
while True:
time.sleep(1) # Keep the main thread alive
except KeyboardInterrupt:
print("Text capture stopped.")
This works reasonably well for capturing and processing text on the screen, but I’m facing a few challenges:
Speed: Sometimes it seems like the script isn’t fast enough to catch very brief flashes of text, while I’m aiming around 0.2 seconds.
OCR Processing Overhead: pytesseract can be slow when dealing with full-screen images, and I’m wondering if there’s a way to make it faster.
Capturing All Text: Since I don’t know where on the screen the text will appear, I have to capture the entire screen, which adds overhead.
I’m looking for advice on how to optimize this code to make it faster and more reliable. Specifically:
Is there a way to make screen capturing faster while still processing the entire screen?
Are there any faster alternatives to pytesseract that work well for real-time OCR?
Any general tips for optimizing the capture-then-OCR workflow to handle such short flashes?
I’d appreciate any guidance or suggestions on how to approach this problem. Thanks in advance!
David Byun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1
Can you identify where or when the flashes happen–even if not with 100% accuracy–without doing full OCR on the whole screen?
Like, how about this:
Every detection interval (200ms or so), cheaply compare the current screen with the previous one to see if text might have appeared. False-positive results are okay but false-negative results are not. This must always reliably complete within your 200ms detection interval.
How you implement this depends on what the text looks like and what else is going on on the screen. For example, if the screen is otherwise static (no animations), you could just look to see if any pixels have changed. Or if you know the text or background will have specific colors, you could look for those.
When you think text might have appeared, copy that area of the screen into a queue. A separate thread will pick it up from there and do the expensive OCR part.
The OCR thread is allowed to take longer than your 200ms detection interval, as long as, on average, it’s able to process images from the queue faster than the quick-detection thread is inserting them.