I’m using WARNING: Using fallback font LiberationSans for base font Times-Roman with tabula which internally uses PDFBox for extraction.
The issue is I’m getting Warning messages for font.
WARNING: Using fallback font ‘LiberationSans’ for ‘MP0001’
org.apache.pdfbox.pdmodel.font.PDType1Font
WARNING: Using fallback font LiberationSans for base font Times-Roman
Python Code
import tabula tabula.read_pdf(pdf_file)
Here are my questions around it.
I’m not sure if this is some font cache building. This is happening for every PDF ran in a loop.
Can these fonts be cached and build only once if possible?
Can someone explain what is going under the hood?
I tried installing apt-get install -q -y libpdfbox-java.