Relative Content

Tag Archive for extractocrtesseractpython-tesseract

Problem to extract correct data from PDF with tesseract

I’m trying to extract specific data from multiple PDFs. I begin by isolating the example image (Picture 1) using horizontal and vertical lines to create cells. After creating the cells, I crop them before starting pytesseract-OCR to extract the text from each cell, as shown in Picture 2.