I have a batch, which consists of: single numbers images, ordinary english characters images, and special characters images.All images have extremly low resolution [11×26 pixels or a bit more], such as: 1, ordinary character
c in a circle;special character.
Reserve links for the images:
1: https://ibb.co/kc40GyF ,
8: https://ibb.co/Jrbtgb1 ,
9: https://ibb.co/rdYz6MS ,
c in a circle;special character: https://ibb.co/kSP5tP2 ,
I want to recognize these characters and get confident scores to select only numbers and ordinary englisth characters and remove special characters from a batch.
I suppose that confident score for the special charaters will be very low in compare with ordinary characters, so i could recognzie and remove them.
I tried to use PaddleOcr and EasyOcr but they didnt detect anything on these images.
After that i tried to upscale image by function:
'upscaled_image = cv2.resize(low_res_image, (2600, 1100), interpolation=cv2.INTER_CUBIC)'
But that didnt help.If i upscale image a bit, it doesnt change anything.If i upscale a lot, it makes my image extremly blurry and i cant remove this blur.
Also i tried to use Pytesseract. Results of the dtections are wrong, and, besides this, always have 0 confidence score.And the confidence score is the most important thing for me, because i dont know any other way to detect special charaters.
My code:
import pytesseract
import cv2
import numpy as np
pytesseract.pytesseract.tesseract_cmd = r'C:Program FilesTesseract-OCRtesseract.exe'
# Read the low-resolution character image
image_path = "cropped_images/3/cropped_1.png"
img = cv2.imread(image_path)
# Preprocess the image if necessary (e.g., resizing, denoising)
# Example: Resize the image to enhance visibility
resized_img = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)
# Convert the image to grayscale
gray_img = cv2.cvtColor(resized_img, cv2.COLOR_BGR2GRAY)
# Perform thresholding or any other necessary preprocessing steps
# Example: Apply adaptive thresholding
thresh_img = cv2.adaptiveThreshold(gray_img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
#thresh_img=img
# Perform OCR using Pytesseract
custom_config = r'--oem 3 --psm 6 eng' # Modify OCR configuration as needed
recognized_text = pytesseract.image_to_string(thresh_img, config=custom_config)
# Display the recognized text
print("Recognized Text:", recognized_text)
results = pytesseract.image_to_boxes(thresh_img, config=custom_config)
for result in results.splitlines():
result = result.split(' ')
character = result[0]
confidence = result[1]
#print(f"Character: {character}, Confidence: {int(confidence)/10}")
print(f"Character: {character}, Confidence: {confidence}")
How can i recognize such images to get the confidence score?How can i enchance them to make their recognation possible ?Or maybe there is other way to recognise special characters in the batch ?