I am using the PaddleOCR library for the detection and extraction of tables in images and pdfs. For the detection of tables, I am using the TableBank Layout Detection model provided by PaddleOCR. This layout gives good results with most images except the ones which have a complex layout or have numerous tables in them and the layout detector is unable to detect the tables accurately even after reducing the threshold of the layout detector.
What can be done to improve the accuracy of the detection? Or are there any other alternative libraries to achieve this task?
I even tried using tabula and camelot but they are used to detect tables in text based pdfs and not images or image based pdfs. So how can I solve this issue?
I tried detecting the table in images containing 5 and more tables in them. However, the PaddleOCR TableBank layout detector failed to detect even a single table in that image. Tabula and camelot only support text based pdfs, so I cannot use those libraries as well for the images or image based PDFs.