Using the doctr library, I recognize text on a pdf file. From the entire text, I select keywords and the coordinates of these words. I receive the coordinates in the following format:
list_with_coordinates = [(0.09370404411764705, 0.0439453125), (0.5925912552521009, 0.1796875), (0.5925912552521009, 0.2041015625)]
Next I transform these coordinates to get real points (to plot the points on the coordinate system). I do this using the fitz library.
import fitz
from pymupdf import Point
doc = fitz.open("file_name.pdf")
page = doc[0]
list_with_points = []
for i in list_with_coordinates:
list_with_points.append(fitz.Point([i[0] * page.rect.width, i[1] * page.rect.height]))
Next, I draw lines from point to point.
for i in range(len(list_with_points) - 1):
page.draw_line(list_with_points[i], list_with_points[i+1])
And at this point I have a question: the zero point of the coordinate system (0, 0) for the fitz library is in the top left corner. And for most documents this rule works (that is, I connect the words I need with lines). However, there are cases when in some documents the zero point of the coordinate system (0, 0) starts from the top right corner. And I have not found any information about this anywhere.
Please tell me why this happens and how can I fix it?