I am trying to parse an unstructured PDF file and extract information about some input fields like radiobuttons, but I am not sure how to do that.
I tried using get_fields
from PyPDF2, it does not return anything because of the nature of the PDF.
When I use extract_text()
it just gives “YES” or “NO” for the radiobutton components. Also when i print the full extracted text, there’s often random spaces in between the words.Is there any method of accurately parsing unstructured PDF files?
Priyanshu Lahiri is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1