I want to process tabular data from images. To do this, we read the image with opencv
and find where the table is by going through the following seven steps. In image number 7, we plan to crop based on the border. In the following example data, it works exactly what I want. This is because there is a black outer border of the image internal table.
image = cv2.imread(image_path, cv2.IMREAD_COLOR)
grayscaled_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
_, thresholded_image = cv2.threshold(grayscaled_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
inverted_image = cv2.bitwise_not(thresholded_image)
dilated_image = cv2.dilate(inverted_image, None, iterations=3)
contours, _ = cv2.findContours(dilated_image, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
image_with_all_contours = image.copy()
cv2.drawContours(image_with_all_contours, contours, -1, (0, 255, 0), 2)
rectangular_contours = []
for contour in contours:
peri = cv2.arcLength(contour, True)
epsilon = peri * 0.02
approx = cv2.approxPolyDP(contour, epsilon, True)
if len(approx) == 4:
rectangular_contours.append(approx)
image_with_only_rectangular_contours = image.copy()
cv2.drawContours(image_with_only_rectangular_contours, rectangular_contours, -1, (0, 255, 0), 2)
max_area = 0
max_area_contour = None
for contour in rectangular_contours:
area = cv2.contourArea(contour)
if area > max_area:
max_area = area
max_area_contour = contour
image_with_max_area_contour = image.copy()
cv2.drawContours(image_with_max_area_contour, [max_area_contour], -1, (0, 255, 0), 2)
However, there are cases where the table does not have an outer border, as shown in the following picture. In reality, the image I want to work with does not have lines on the outside. The image below is a temporary work created for explanation purposes.
As you can see in the picture above, if there is no outer border, problems arise in the process of obtaining the Thresholded Image
. Later, it becomes impossible to leave a square contour line by doing cv2.findContours
.
Ultimately, what I want is to read the values in the Name and Favorite columns into Pandas. I am currently following the process by referring to this post. How can I select the rectangle of the largest contour line?
1