I’m currently working on a project where I need to extract all paragraphe from a pdf documents.
Do you have any ideas on how do it? Which library to use? The used code to do it would be most appreciated.
I currently extract all sentences using poppler. And I have a pretty decent toc with pdfstructure. However I can’t manage to extract a list of all paragraphs.
New contributor
user25221253 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.