Enhancing Document Layout Analysis by Adding Positional and Character Information to CNN Inputs
I am working on document layout analysis and have been exploring CNNs and transformer-based networks for this task. Typically, images are passed as 3-channel RGB inputs to these networks. However, my data source is in PDF format, from which I can extract the exact position and character information directly.