I am parsing PDFs with unstructured for use in an application. Later i would like to highlight some of these extracted text passaged in the actual pdf. I have seen that pymupdf has the capabilities to do that. However i am running into the case where the pymupdf search functionality does not find the extracted text again. This appears to be related to linebreaks and quotation marks. I have seen the search functionality in pymupdf allows the setting of some flags, however i have not found a documentation for these and how they match against what happens in unstructured. Does someone have experience doing something like this and knows what settings or tricks can be used to align them and make this work?
Cheers
1