I have been trying to draw bbox on a picture of the document which I am trying to ocr using mindee-doctr to see the detected lines of text. The problem I am facing is the absolute coordinates of the bboxes I calculated, by multiplying the relative coords and page dimensions, when drawn on the original image are all shifted towards the top right corner. Is there a way to correct this?
This is my code:
# Use docTR to analyze the image and obtain the result
line_boundaries = []
model = ocr_predictor(pretrained=True) #setting preserve_aspect_ratio=False or symmetric_pad=False didn't make a difference.
doc = DocumentFile.from_images(img_path)
result = model(doc)
# Extract bounding box coordinates for each line
for page in result.pages:
for block in page.blocks:
for line in block.lines:
#Multiplied relative coordinates with page dimension to get absolute coordinates
x_min, y_min, x_max, y_max = round(line.geometry[0][0] * page.dimensions[0]), round(line.geometry[0][1] * page.dimensions[1]), round(line.geometry[1][0] * page.dimensions[0]), round(line.geometry[1][1] * page.dimensions[1])
line_boundaries.append((x_min, y_min, x_max, y_max))
This is the value of line_boundaries:
[(531, 148, 1321, 184), (2725, 148, 3061, 177), (526, 254, 3071, 295), (526, 288, 3071, 332), (535, 324, 3071, 363), … ]
This is the image with bboxes drawn on it.
doctr detection with bbox
I have tried it without the round-off which made no difference and have also tried using the predictor only but to no avail. Setting preserve_aspect_ratio=False or symmetric_pad=False in the ocr_predictor also didn’t make a difference.
Nalin Malla is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.