As an example, I have this image and will like to convert this to an modifiable excel table. In have tried using the ‘pytesseract’ library, but it doesn’t accurately extract the text from the image into a correct string format that can be converted into a csv. I have manually created the string to get the csv, but this wouldn’t be feasible for a larger table. What can i do?
image_to_extract
import pandas as pd
import pytesseract
from PIL import Image
from io import StringIO
image = Image.open(image_path)
# Use pytesseract to extract the text
text = pytesseract.image_to_string(image)
# Print extracted text to verify content (optional)
print("Extracted Text:n", text)
# Step 3: Parse the extracted text to structure it into a table
data = """
SN,Abbreviation,Attribute Definition,4 (323),5 (60),6 (335),7 (72)
1,Input 2,All inputs from scenario 2,x,x,x,x
2,Input 2_4_x,Rolling average of all inputs from scenario 2 - team1,x,x,,
3,Input 2_4_y,Rolling average of all inputs from scenario 2 - team2,x,x,,
4,Input 3,All inputs from scenario 3,x,x,x,x
5,Input 3_4_x,Rolling average of all inputs from scenario 3 - team1,x,x,,
6,Input 3_4_y,Rolling average of all inputs from scenario 3 - team2,x,x,,
7,home_next,Who would be at home in the next game,x,x,x,x
8,date_next,Date of the next game,x,x,x,x
9,team_opp_next_x,Next opponent,x,x,x,x
10,PER_Combined_opp_next_x,Combined Player Efficiency Rating (PER) of next opponent,x,x,,
11,elo_rating_opp_next_x,Elo rating of next opponent,x,x,x,
12,head_to_head_win_ratio_next,Historic Head-to-Head Win Ratio against next opponent,x,x,,
"""
# Step 4: Convert the parsed data into a DataFrame
df = pd.read_csv(StringIO(data))
# Save the DataFrame to an Excel file
df.to_excel(excel_path, index=False)
print(f"Data saved to {excel_path}")
Extracted text