I have a pdf which has a table inside it I have some text as well as embedded links. I want the output text extracted to be same as input while preserving table formatting and getting the embedded links as well.input pdf
I used python library=pymupdf4llm, I am getting the table in the correct format but not the embedded links inside it. I want the output to be exactly the same as input with hyperlinks which are clickable.
I have attached input pdf. I want output to have the table as well as embedded links. Currently in output I am getting “check link” text and I actually want “check link [“https://github.com/UB-Mannheim/tesseract/wiki”]”, like that for each link. This is the code that I am using currently which is giving table with just text “import pymupdf4llm
md_text = pymupdf4llm.to_markdown(“attached_table_links.pdf”)
print(md_text)”
Manmeet is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.