Lang Chain PDF Loader for JS working on some pdfs and not others
I am trying to build an AI Saas, using next.js, aws s3, neondb, and pineconedb that takes in a pdf and let’s you chat with openAI about the contents. I am currently writing a function that takes in the pdf and uses PDFLoader from Langchain to convert the pdf in text strings. When I test this function though, certain pdfs work and others don’t. I’ve noticed that simple pdfs that are from say google doc or word are easily processed and return an object with the contents. However, when I try to test a pdf that is a print to pdf from a webpage, I get an empty array . Here is an example: ‘https://www.congress.gov/118/bills/hr5009/BILLS-118hr5009pcs.pdf’