I’d like to mass-edit pdf files, and more specifically to remove the line containing the “X” character from all my pdf files, but I can’t do it.
my document has 3 parts: a header with company logo / customer data
a 2nd part with a table containing all the operations => it’s in this part that there’s the line containing the “X” character that I want to delete 3 rd part: bottom of the document.
following these modifications, i wanted to transform the doc back into a pdf.
Do you have any ideas?
I’ve tried using Python to transform my pdf into txt, but the latter has interpreted it very badly, with broken lines and misplaced column names making the document completely unreadable. => I wanted to do this to then modify and delete the line I wanted but if it’s badly interpreted in txt it’s useless
here the code :
from pdfminer.high_level import extract_text def pdf_to_text (pdf_path, txt_path): text = extract_text(pdf_path) with open (txt_path, 'w' , encoding= 'utf-8' ) as txt_file: txt_file.write(text) # Exemple d'utilisation pdf_to_text( 'votre_fichier.pdf' , 'votre_fichier.txt' )
fatiha moscatiello is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.