I am trying to scrape multiple pdfs using pdfquery, and it has been working perfectly up until now. When I try to load a certain pdf, it produces an xml tree containing only the bbox for the entire page, and not anything for the other cells in the pdf.
The specific pdf causing issues is much larger than the others I have been working with (about 1400 x 1800 pt) which I thought was the issue, but when I scaled it down it gave me the same problems. Does pdfquery have any limitations I am unaware of? Thanks for the help!
1