I'm trying to convert PDF data, including tables, into structured JSON format using Node.js. I've experimented with libraries like
pdf-parser,
pdf-reader, and
pdf2json`, but the results haven’t been ideal. The table extraction is inaccurate, the JSON lacks a clear object structure, and I’m having trouble identifying column names or handling empty cells.
Ideally, I’d like to obtain JSON that represents the table structure with objects for each cell, row, and column. This would also allow me to identify column names and deal with empty cells gracefully.