I am working on a project to detect tables and extract columns from images using a deep learning model with PyTorch and OpenCV. While I can successfully detect the tables and obtain contours for the individual cells, I am struggling with accurately defining the column boundaries.
Here’s a summary of my approach:
Preprocessing: Resize and normalize the image.
Model Inference: Use a custom PyTorch model to generate table and column masks.
Contour Detection: Detect contours in the table mask using cv2.findContours.
Filtering Contours: Filter out small or irrelevant contours based on area.
Bounding Rectangles: Draw bounding boxes around the detected contours.
I want to improve the column detection by considering the spatial relationships between the contours. Specifically, I want to overlap the contours to determine the column boundaries. If two contours overlap significantly along the x-axis, they should be considered part of the same column.
What I’ve Tried:
Sorting contours based on their x-coordinates.
Filtering contours based on area to remove noise.
Using bounding boxes to visualize detected tables.
Issues:
Difficulty in determining accurate column boundaries.
Contours often represent individual cells, making it hard to group them into columns.
Code:
Here’s the relevant part of my code:
import cv2
import numpy as np
import torch
import torch.nn as nn
from PIL import Image
from albumentations import Compose, Normalize
from albumentations.pytorch import ToTensorV2
# Model Definition (DenseNet and TableNet classes omitted for brevity)
TRANSFORM = Compose([
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], max_pixel_value=255),
ToTensorV2()
])
def perform_ocr(image):
# Perform OCR using Tesseract
image = Image.fromarray(image)
return pytesseract.image_to_string(image, lang='eng')
def predict(img_path):
orig_image = Image.open(img_path).resize((1024, 1024))
test_img = np.array(orig_image.convert('LA').convert("RGB"))
image = TRANSFORM(image=test_img)["image"]
with torch.no_grad():
image = image.unsqueeze(0)
table_out, column_out = model(image)
table_out = torch.sigmoid(table_out).detach().numpy().squeeze(0).transpose(1, 2, 0) > 0.5
column_out = torch.sigmoid(column_out).detach().numpy().squeeze(0).transpose(1, 2, 0) > 0.5
contours, _ = cv2.findContours(table_out.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
contours = [c for c in contours if cv2.contourArea(c) > 3000]
# Visualization and OCR steps omitted for brevity
for c in contours:
x, y, w, h = cv2.boundingRect(c)
cv2.rectangle(orig_image, (x, y), (x+w, y+h), (0, 0, 255), 4)
# How to overlap contours to determine columns?
# ...
cv2.imshow("Detected Tables", orig_image)
cv2.waitKey(0)
# Load model and process image (model loading code omitted for brevity)
Questions:
How can I overlap the detected contours to accurately determine the column boundaries?
What is the best approach to handle overlapping contours in OpenCV for column detection?
Are there any recommended techniques or best practices to improve the accuracy of column detection in tables?
Any guidance or suggestions on how to effectively overlap contours to detect columns would be greatly appreciated! A newbie in image processing so you can be a little direct on your answer please !
Note this project involves “scanned documents”, if that is important.