I’ve got a PDF file consisting of 4 pages of data. The first 3 pages contain 3 tables with the same columns. After some research I’ve found a pretty good python lib called camelot-py. Would it be possible to generate a single CSV file containing all data from the PDF file (with the condition all tables have the same columns a data types), using camelot-py?
I implemented this first version below, inspired on this post. The problem is that it generates multiple CSV files one per table found inside de PDF file, and I need a single CSV file containing all the data from the PDF pages. I would like to have just one file in the end of the process. Would it be possible using this camelot-py python lib, maybe in conjunction with another one (like pandas)?
# First version
import camelot
def pdf2csv_single(pdf_filename, csv_filename):
tables = camelot.read_pdf(pdf_filename)
if tables:
# Se sim, salvar a primeira tabela em um arquivo CSV
tables.export(f'{csv_filename}', f='csv')
print("Data successfully extracted to CSV")
else:
print("No table found")