I’m working on a classification task. I want the model to identify if the issue mentioned in a table is related to a specific topic. To do so I initially provided text only for context. I would now like to add images (.jpg files) containing all the knowledge base on the topic (images are pdf transformed into jpg that contain: images, tables and text).
Will the model be able to use this additional context ?
Considering it is a pretty large base ~600 pages of pdf, hence ~600 image files what is the optimal way of working?
- Merge all the image files into one? (if so, what technique do you recommend, (aspose crashed my kernel))
- Input all the image individually?
- Vectorize the images and follow a RAG architecture?
Thanks for taking the time
Amac is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.