I am embarking on a project in OCR and I thought the simplest task to start would be to parse a block of digits so I created a simple block of digits which I wanted to parse.
However, I am struggling to reliably parse this simple image (see below). The obvious mistake that I can spot easily is that ‘0’ gets classified as an ‘8’.
The image is high quality – I’ve pasted a section of a larger image I have. I don’t understand if I am doing something incorrectly or if this is really the limit of tesseract. Below is my code:
def read_gray(file):
img = cv2.imread(file)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
return gray
config = '-c tessedit_char_whitelist=0123456789 --psm 6'
text_raw = pytesseract.image_to_string(read_gray('example.png'), config=config)
What am I doing wrong? Is it worth it to slice this image into small squares each containing a single digit? Can this be automated?
I feel like I have missed something and I’m making things more complicated.