I currently am writing a program in python that takes an image that has a lot of text, extracts this to a .txt file, then compares the found words with a list of words in another file and creates some coordinates(according to the pixels) of the words found in the image, if an image is found a red square is drawn in the image. So far I have dealt with the coordinates part properly, the squares are drawn around the words and the coordinates given match very accurately.
My problem is the word detection: several words that are indeed in the image are NOT found by the ocr, I think the problem is because they are not written in the same line but rather they are inside several white spaces so the sentences are “cut” : example Journal Voucher -> Journal and several words later we find the Voucher word.
I did a test on the OneNote text detection function and I see that the results are very good, so I am convinced it is possible to get better results detecting the text.
I would appreciate any help improving the text detection( another library?, a different approach?) I have run out of idea on how to improve the text detection but it is imperative that I improve it otherwise the coordinates are not getting detected either.
this is the part of my code that handles the image:
from PIL import Image, ImageEnhance, ImageFilter, ImageOps
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:UsersxxxAppDataLocalProgramsTesseract-OCRtesseract.exe'
image_path = r'C:UsersxxxPicturesxxx.png'
image = Image.open(image_path)
#grayscale
image = image.convert('L')
#enhance contrast
image = ImageEnhance.Contrast(image).enhance(2)
#Sharpen the image
image = image.filter(ImageFilter.SHARPEN)
#reduce noise
image = image.filter(ImageFilter.MedianFilter(size=3))
#save the preprocessed image
preprocessed_image_path = r'C:Usersxxxxxxpreprocessed_image.png'
image.save(preprocessed_image_path)
print(f"Preprocessed image saved to {preprocessed_image_path}")
#OCR
custom_config = r'--oem 3 --psm 3'
ocr_text = pytesseract.image_to_string(image, config=custom_config)
#Save OCR text to a .txt file
txt_output_path = r'C:UsersxxxDocumentsocr_output.txt'
with open(txt_output_path, 'w', encoding='utf-8') as file:
file.write(ocr_text)
print(f"OCR text saved to {txt_output_path}")
this is a small part of the image i am working with , the original image is much much larger but the format remains similar.
small image