The OCR result I’m getting is 4qhw3D when it should be 4qhW3D. I’ve tried multiple psm and oem modes, but none seem to work correctly. I’ve just started with this and don’t have much knowledge about ocr.
Below is the program, I’m currently using
import base64
import io
from PIL import Image, ImageEnhance
import subprocess
base64_image = "iVBORw0KGgoAAAANSUhEUgAAAMsAAAAyCAYAAADyZi/iAAAHEUlEQVR42u1cf2RcSRyPFSsiQqy1ouqIqlNRJapOrXOcqFoRoeJUnRNOnDoR5dSJqghxoqqiVEVFxVKrTlSFqqqKc1RURESJc+L0jxIVEWst72bO912m38x7M/PezMvu2++Hr8SbX/tm5jPz/X7nO6+tjUAgEAgEAoFAIBAIBAKBQCAQCASCCA+hBd8/z2SYyRSTMpPXTHaZVJnU4O8/TCpMJpn00qxxMxBFT4I0k4VV8R5V+Y1h+eeo/BvD8l+j8tsm76+BOpM7TDqi9G8AOCH3mewwWWayyGScyblWIUonH6gWJMsjVOWUQdkMrOZ4crYb1DGFyi9aJouPt0xylsgSBr6rzTA5mWay3A16+5STZRRV+SLuTsxw2aCOFVR2VPP9PzB5wGSET0xOXEjP8vaZLEl+14sEyOKjBqTJpo0oF4WX/NRiZClIBjmjWfYmUk18TMfYmQqKMtxGKen8RpZnyJTIuv0Lvz0HxPwZSF+XtPcuNXYT12WR+jXWSmSJY7ewfM+EMuJK/lKz/Leo3S0H/TWH2ii76l/Y4WQayraOCtgMZBE781Wje5wckWUBVXtLc1cQd5MvTHcn3g5qd8FBf/WjNj657l++2ICHTsTvzU6UC0iN6LM5IUG94wb038itWTbR63UHE1SCMqiSvL0DJmugO/eG1HkFVftS8918bMCzdya7E28HtTviYIwzWM1MYjGC/qlHteUajSjcENwSXuSGrQ5jRXqYPNUwAl/5OrqBrnwkH+jOK4q29riuH1BnXmK3tCve8Rch/0N4Nq/rVeP1SyZTztFYizhIaueGRUrEH81KllnRCLOot/agFVaF9zDZI5GFoQvcoroemgsB9W6ivIOK91wW8l6BZyPCsxVF+UuovQ1H43wKtfNngmTpljgwTjcbUc6jc4EzFsnyTDJJ+aHYWVAJMvC/aAhWYpDFN6x34GAsD/n4udGPTD6i/OsB9T5A+aYVqs2BkDcvLBRadotk1b3vaKxvo3Z+TdjbWEZVXm829UtcRWdsdRjL+p3kpHcwJP+gZOUxJct/q7I/YSX5ByTqTlGSbwSriJq23lbIDlVUuIBd2ytF5ITgC0p3wmS5hhe3ZiKLuNJsyQ6NYpBlCxWd0CgzEZMsNdXWLjmlvxugPmrZLRBz5eMRSnsopN00sFd6LC6G3Bt1H7XBd9iBpL2NyBGiVAMbiShnUQdetNVhkk75S9N9mgFvWVSyLGq0gXeNlYB86zreG+S8GEVpoyp3KXjtlKphFO9gACq64ScOyNJh4rpuFKJkQF1R6sgRyTKjq/NLyk7HIMuwRv15nQFD3iyO2YB8or3Si9IKKs8Tcq5wzDsiSxU0iYIN17yl31htNvWLn3V0WSbLcoz4qFIMshQ0FwrlgEl2oDcBNlBohDCKCDgvSV81JXzMnaUOalmuAchSa3Si9CP1q2S7wyRep5zB74vsOo44YPUQV2doBDHEQYWqgBCyLrXbwKaooza6LTtwvuJeL1gUsWp8knaW8FV1XTc2KAZZqjHczZmEyRJWPz4jKqH0iipCGNktFZSGgxvXHI59l8QtvxF24NrSNgu6L7HraivGLuC46sQxkuUeyvobSt8X0k4E1HEiyG6BM6dQz5yDObCme9bhgCzFpjnFR1v+VVdGXop2Fmy3rApp50SVRtGe6OEbEJ7jaIOhBObAkG7sW0ufs3gOkAKbxVOoLthuyULadYMbjY+FvD/Bs06JvdKZwBzI6saHJXCCP05kOSZvmAuXqGT1L0nslasGK+oTeDZ8XAd0uh4py7FhOUmERh+R5ejZgck5y0yDkQVfYLoDz/fE+yuKOvpEWzHAHppLaA5gL99eQmSZjfMxj6ZARJuleBwn+I7IcmQHgOiH/2OsNNsUXbdnVJ42h+N5TfcuvsX7LLLvEwwSWQ7L4diwSY0yN+LEhjkii8y2EO+vPI6gr08G2UKOx7JDcv1g0iVZIDYNf8vhaVsaYTHquOY66tjVO0nsFjHE5XvNNn8IKP+Zl83gHZ7rBEMK+QuS25i7YYegFu7gz0uGc9NWoGhqyCIMKAa3AfoFtQvfZ3nSgGSZCzHbTmm2+WVIHbMxxmUdwpdKQIiMkCcPz++hMyGr0RswjnkICp2AcZd93eVt0PUJIov5TcnNRnIdh5xN+Phg2B8fA+opxR0XQxx4im+SWXYI8XO3W57BBwdbjiwCYSqad/DzhitaUmTJBqyUS4Z9UQ4IbGxPiCx1+A19CRDSA2fNba9VvrFs2SOyAJ6xOsgODN6liNt/ImSBMquSyTBm2AfjkjpeR+zPAkQYTEMfrnqHX7LxV/N9UH2W4Fp1rwNC1qCdbe/wW8djvqpNaBKCEghEFgKBQGQhEIgsBAKRhUAgshAIRBYCIU1kqYpCPUIgEAgEAoFAIBAIBAKBQGj7F2RM+tdTTBtoAAAAAElFTkSuQmCC"
image_data = base64.b64decode(base64_image)
image = Image.open(io.BytesIO(image_data))
if image.mode == 'RGBA':
image = image.convert('RGB')
enhancer = ImageEnhance.Contrast(image)
factor = 5.0
enhanced_image = enhancer.enhance(factor)
enhanced_image.save('enhanced_image.jpg', 'JPEG')
result = subprocess.run(['tesseract', 'enhanced_image.jpg', 'stdout', '-l', 'eng'], capture_output=True, text=True)
ocr_result = result.stdout.strip()
print(ocr_result)
Can anyone help me improve the accuracy of the OCR process for this CAPTCHA? Any tips or modifications to the image preprocessing or Tesseract configuration that could help?
pikaboo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.