I need to make text extraction from PDFs more lenient so as not to be so rigorous in the extraction, as there are texts in the middle of the PDF that it doesn’t extract and I know they are there. I’m testing PyMuPDF.Texting the speed and assertiveness of the capture, especially with punctuation and accents,if anyone knows somehow
My code where I extract the text according to my project coordinates:
https://colab.research.google.com/drive/1ngvHSPkG-vM8Cvf1JIhcuu6qdfvo4QUe?usp=sharing
DIONATAN LUIZ RITTER is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2