Currently, we are doing some pre-processings and use tesseract to extract text from this particular receipt.
Please note that I’m only asking for this particular receipt because I hope it’s easier to answer the question with a specific problem and avoid being generic.
Here’s the original receipt:
After pre-processing, the image looks like this:
The pre-processing involves blurring and etc. The result is quite bad, so I’ll skip it.
The extracted text using Tesseract is jibberish as expected because the pre-processed image is so bad. Here is one part:
™7 174 Chse CRYVALL SORHS 515 iy X 7
o2t 130 Ll PLST 1K 1A
JBTOTAL 763.25 AT
SALES TAX 78.23 L SRy
Torak $841.48 T :
I wonder how you would pre-process this image, so the OCR would result in a better accuracy and precision.
Thank you so much.