I am trying to read the text from a U.S. penny to orient the coin.
the original is from
https://www.usmint.gov/wordpress/wp-content/uploads/2024/05/2024-lincoln-penny-uncirculated-obverse-philadelphia.jpg
I have extracted the word liberty, and am having difficulty with tesseract.
the code I am using is
def GetLibertyCroppedArea(img):
# ll = 10,242
# ur = 130,190
top = 190
bot = 240
lft = 10
rgt = 130
cropped_area = img[top:bot, lft:rgt]
cropped_area = cv2.resize(cropped_area, (360,150))
return cropped_area
def LoadImage(fn):
img = cv2.imread(fn)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.resize(img, (400,400))
return img
img = LoadImage(fn)
liberty = GetLibertyCroppedArea(rotated_image)
text = pytesseract.image_to_string(liberty, config='--psm 8')
text = text.rstrip()
print(text)
I have tried various psm values without success
what preprocessing steps are necessary for tesseract to have a better chance of recognition ?
2