Tying to use iText7 8.0.4 to extract text from pdf file outputs question marks only:
???????? ??????????
?
???????????????????????? ???????????????????
???????? ????????????????????????????
???????????????????????? ????????????????????????????
?????????????? ?????????????????????
???????????? ????????????????????
??????????
????? ??????????
????????????????????? ???????????????????????????????????
????????????? ??????????
????????????????? ??????????????????????????????????
??????? ???????????????? ????????????????
...
In Adobe and Chrome text can copied from pdf properly. How to extract this text in C#
Code used:
MemoryStream pdfStream = ...
pdfStream.Position = 0;
var strategy = new LocationTextExtractionStrategy();
var reader = new PdfReader(pdfStream);
using var pdfDocument = new PdfDocument(reader);
for (int i = 1; i <= pdfDocument.GetNumberOfPages(); ++i)
{
var page = pdfDocument.GetPage(i);
var text = PdfTextExtractor.GetTextFromPage(page, strategy);
}
PDF is at
https://wetransfer.com/downloads/d0c0915f3416b4d9cc735a5b8a36443f20240604123056/c0768c