I have been trying to draw bbox on a picture of the document which I am trying to ocr using mindee-doctr to see the detected lines of text. The problem I am facing is the absolute coordinates of the bboxes I calculated, by multiplying the relative coords and page dimensions, when drawn on the original image are all shifted towards the top right corner. Is there a way to correct this?
This is my code for calculating bboxes:
# Use docTR to analyze the image and obtain the result
line_boundaries = []
model = ocr_predictor(pretrained=True) #setting preserve_aspect_ratio=False or symmetric_pad=False didn't make a difference.
doc = DocumentFile.from_images(img_path)
result = model(doc)
# Extract bounding box coordinates for each line
for page in result.pages:
for block in page.blocks:
for line in block.lines:
#Multiplied relative coordinates with page dimension to get absolute coordinates
x_min, y_min, x_max, y_max = round(line.geometry[0][0] * page.dimensions[0]), round(line.geometry[0][1] * page.dimensions[1]), round(line.geometry[1][0] * page.dimensions[0]), round(line.geometry[1][1] * page.dimensions[1])
line_boundaries.append((x_min, y_min, x_max, y_max))
This is the value of line_boundaries:
[(531, 148, 1321, 184), (2725, 148, 3061, 177), (526, 254, 3071, 295), (526, 288, 3071, 332), (535, 324, 3071, 363), … ]
This is the function I used to draw the boxes:
def draw_rectangles(image_path, line_boundaries):
"""
Draws rectangles on an image using the provided line boundaries.
Args:
image_path: The path to the image file.
line_boundaries: A list of line boundaries, where each boundary is a list of four points.
Returns:
None
"""
# Load the image
image = cv2.imread(img_path)
#image = DocumentFile.from_images(img_path)[0] #essentially equivalent to cv2.imread()
# Iterate over the line boundaries and draw rectangles
for boundary in line_boundaries:
#x1, y1, x2, y2 = int(boundary[0][0]), int(boundary[0][1]), int(boundary[2][0]), int(boundary[2][1])
x1, y1, x2, y2 = map(int, boundary) # Convert coordinates to integers
cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
# Display the image with rectangles
cv2_imshow(image) # Use cv2_imshow instead of cv2.imshow for collab only
cv2.waitKey(0)
cv2.destroyAllWindows()
This is the image with bboxes drawn on it.
doctr detection with bbox
I have tried it without the round-off which made no difference and have also tried using the predictor only but to no avail. Setting preserve_aspect_ratio=False or symmetric_pad=False in the ocr_predictor also didn’t make a difference.
Nalin Malla is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.