Using PdfPig to read text from order PDF as described in Get Text Line By Line From PDF using C# answer
<code> static void GetWordsInReadingOrder(UglyToad.PdfPig.Content.Page page, StringBuilder builder)
{
var words = page.GetWords().ToList();
var wordsList = page.GetWords().GroupBy(x => x.BoundingBox.Bottom);
foreach (var word in wordsList)
{
bool esimene = true;
foreach (var item in word)
{
if (!esimene)
builder.Append(' ');
esimene = false;
builder.Append($"{item.Text}");
}
builder.AppendLine();
}
}
</code>
<code> static void GetWordsInReadingOrder(UglyToad.PdfPig.Content.Page page, StringBuilder builder)
{
var words = page.GetWords().ToList();
var wordsList = page.GetWords().GroupBy(x => x.BoundingBox.Bottom);
foreach (var word in wordsList)
{
bool esimene = true;
foreach (var item in word)
{
if (!esimene)
builder.Append(' ');
esimene = false;
builder.Append($"{item.Text}");
}
builder.AppendLine();
}
}
</code>
static void GetWordsInReadingOrder(UglyToad.PdfPig.Content.Page page, StringBuilder builder)
{
var words = page.GetWords().ToList();
var wordsList = page.GetWords().GroupBy(x => x.BoundingBox.Bottom);
foreach (var word in wordsList)
{
bool esimene = true;
foreach (var item in word)
{
if (!esimene)
builder.Append(' ');
esimene = false;
builder.Append($"{item.Text}");
}
builder.AppendLine();
}
}
Splits single PDF line to multiple lines if there is small difference in x.BoundingBox.Bottom.
How to add some tolerance so there if there is small differenct in Y positons, itemas appear in same line?
Probably rounding should added to
<code>var wordsList = page.GetWords().GroupBy(x => x.BoundingBox.Bottom);
</code>
<code>var wordsList = page.GetWords().GroupBy(x => x.BoundingBox.Bottom);
</code>
var wordsList = page.GetWords().GroupBy(x => x.BoundingBox.Bottom);