Currently, I’m using Google’s Vision AI to extract information about dates and prices from pdf files. I proceed with the following steps:
- Extract the text from the PDF. The result received from Vision AI is characters in each page and their position.
- Identify words in the format similar to dates or prices.
- Determine the position of these words in pixels from the upper left corner of each page in the document.
For files with less than 5 pages, I can use the annotate method (synchronous request), the results are returned directly and have a pretty good response time, about 5s to process.
For files with more pages, I have to use the asyncBatchAnnotate method (asynchronous request), the results are returned as json files in Google Cloud Storage, so the response time is quite slow. With large files, it can even take more than 1 minute to process all the pages in the file.
ocr-example
Previously, I tried using pdfminer and pytesseract to extract information from files, but their processing speed was not good, so I had to try using Vision AI.
I also tried using popular LLM services like ChatGPT or Gemini but they also had bad responses or slow processing times.
Is there any other option to extract information with their coordinates from a pdf file with a large number of pages (about 20 – 100 pages) and still have good response speed?
Đạt Vũ Trọng is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.