I encountered an issue while extracting data from an image for the RAG system. The problem lies in the extracted text, which appears like this:
‘Shiek Muhammed bin Khursid Mr Shiekh Omar Abdulla Managing Director CEO Email: [email protected] Email: soabdulla786@gmail,com Contact: 553151 Contact: 556523’
This formatting makes it challenging for the LLM to provide efficient information.
Any suggestions on how to improve this?
I used an unstructured method to extract the text. I want to extract text in format that can be easily understood by LLM in my RAG system.
Sumit Thokar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1