I’m trying to automate the login process for the Icinga dashboard through MagicInfo using Selenium WebDriver in Python. The challenge I’m facing is that the MagicInfo website opens a remote window for the monitors, which has a very low resolution. This low resolution makes it difficult for Tesseract OCR to accurately recognize text, which in turn makes the login automation unreliable.
What I’ve Done So Far:
Installed Required Libraries:
Selenium
Tesseract OCR (pytesseract)
Pillow (PIL)
OpenCV for image processing
Increased Browser Zoom:
I tried increasing the browser zoom level to improve the resolution but it didn’t significantly enhance the OCR results.
Image Processing:
I used OpenCV to preprocess the screenshots (resizing, converting to grayscale, applying thresholding), but the text recognition is still unreliable.
Coordinates Method:
I used the coordinates returned by Tesseract to simulate clicks, but this approach is fragile as the coordinates often change with minor UI adjustments.
Example Code:
import pytesseract
from PIL import Image
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time
import cv2
import numpy as np
pytesseract.pytesseract.tesseract_cmd = r'/path/to/tesseract' # Update this path for your OS
driver = webdriver.Chrome()
driver.get("https://example.com")
# Zoom the webpage to increase resolution
driver.execute_script("document.body.style.zoom='200%'")
time.sleep(5)
screenshot_path = "screenshot.png"
driver.save_screenshot(screenshot_path)
# Load and preprocess the image
image = cv2.imread(screenshot_path)
resized = cv2.resize(image, (0, 0), fx=2, fy=2, interpolation=cv2.INTER_LINEAR)
gray = cv2.cvtColor(resized, cv2.COLOR_BGR2GRAY)
_, binary = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY)
processed_image_path = "processed_image.png"
cv2.imwrite(processed_image_path, binary)
# Perform OCR
ocr_result = pytesseract.image_to_string(Image.open(processed_image_path))
print(ocr_result)
# Extract coordinates and click
ocr_data = pytesseract.image_to_data(Image.open(processed_image_path), output_type=Output.DICT)
search_text = "Username"
text_coordinates = None
for i in range(len(ocr_data['text'])):
if ocr_data['text'][i].strip() == search_text:
x, y, w, h = (ocr_data['left'][i], ocr_data['top'][i], ocr_data['width'][i], ocr_data['height'][i])
text_coordinates = (x + w // 2, y + h // 2)
break
if text_coordinates:
actions = ActionChains(driver)
actions.move_by_offset(text_coordinates[0], text_coordinates[1]).click().perform()
else:
print(f"Text '{search_text}' not found")
driver.quit()
Issues:
Low OCR Accuracy:
The resolution of the remote desktop session via MagicInfo is too low for Tesseract to accurately recognize text.
Unreliable Coordinates:
The coordinates method is not reliable as minor changes in the UI layout cause the coordinates to shift.
Questions:
How can I improve the OCR accuracy in a low-resolution environment?
Is there a more robust method to interact with dynamic web elements using OCR or any other technique?
Are there any tools or libraries better suited for this task that can handle dynamic UI changes more effectively?
Can I improve the remote desktop resolution or any other environment settings to enhance the screenshot quality?
Any suggestions or best practices to tackle these challenges would be greatly appreciated!
Kartik Saini is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.