After installing Ghostscript and Camelot, I ran the foo.pdf test from Camelot website (https://camelot-py.readthedocs.io/en/master/). (And downloaded the PDF foo.pdf)
It creates a zipped folder with a new PDF inside, AND now both the original PDF and new one are un-openable (examples below). I was expecting a CSV not a corrupted PDF…
So, I’m just not sure what I did wrong… and how to fix it?
I’m assuming the problem has to do with me missing some dependency or maybe Microsoft Edge PDF viewer?, but I have Ghostscript and Camelot?
Any information is appreciated!
Test below from Camelot homepage:
import camelot
tables = camelot.read_pdf('foo.pdf')
print(tables)
#<TableList n=1>
tables.export('foo.pdf', f='csv', compress=True) # json, excel, html, markdown, sqlite
tables[0]
#<Table shape=(7, 7)>
tables[0].parsing_report
{
'accuracy': 99.02,
'whitespace': 12.24,
'order': 1,
'page': 1
}
tables[0].to_csv('foo.pdf') # to_json, to_excel, to_html, to_markdown, to_sqlite
tables[0].df # get a pandas DataFrame!
This returns a normal result in the VScode terminal when I replace the copy of foo.pdf.
Result:
<TableList n=1>
When I ask it to print the table it works just fine, but the problem is the output file is corrupted and so is the original file:
Chart = tables[0].df # get a pandas DataFrame!
print(Chart)
Result:
#Test of Ghostwriters installation:
import ctypes
from ctypes.util import find_library
lib = find_library("".join(("gsdll", str(ctypes.sizeof(ctypes.c_voidp) * 8), ".dll")))
#<name-of-ghostscript-library-on-windows>
print(lib)
Result:
C:Program Filesgsgs10.03.0bingsdll64.dll
so I can see the dependency is there..
I can even try to run the Camelot test a second time, but then I get errors below about the PDF being un-readable.
PS B:Py> & b:/Py/.conda/python.exe b:/Py/Camelot_Test.py
Traceback (most recent call last):
File "b:PyCamelot_Test.py", line 3, in <module>
tables = camelot.read_pdf('foo.pdf')
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "B:Py.condaLibsite-packagescamelotio.py", line 113, in read_pdf
tables = p.parse(
^^^^^^^^
File "B:Py.condaLibsite-packagescamelothandlers.py", line 169, in parse
self._save_page(self.filepath, p, tempdir)
File "B:Py.condaLibsite-packagescamelothandlers.py", line 108, in _save_page
infile = PdfFileReader(fileobj, strict=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "B:Py.condaLibsite-packagesPyPDF2_reader.py", line 1971, in __init__
super().__init__(*args, **kwargs)
File "B:Py.condaLibsite-packagesPyPDF2_reader.py", line 317, in __init__
self.read(stream)
File "B:Py.condaLibsite-packagesPyPDF2_reader.py", line 1409, in read
self._find_eof_marker(stream)
File "B:Py.condaLibsite-packagesPyPDF2_reader.py", line 1465, in _find_eof_marker
raise PdfReadError("EOF marker not found")
PyPDF2.errors.PdfReadError: EOF marker not found