This code sample from google cloud docs is supposed to produce output as csv, html or markdown files, but all that is in the output is ‘Tables in Document’, when run in a Google Colab notebook:
# TODO(developer): Uncomment these variables before running the sample.
# Given a local document.proto or sharded document.proto in path
# document_path = "path/to/local/document.json"
# output_file_prefix = "output/table"
def table_sample(document_path: str, output_file_prefix: str) -> None:
wrapped_document = document.Document.from_document_path(document_path=document_path)
print("Tables in Document")
for page in wrapped_document.pages:
for table_index, table in enumerate(page.tables):
# Convert table to Pandas Dataframe
# Refer to https://pandas.pydata.org/docs/reference/frame.html for all supported methods
df = table.to_dataframe()
print(df)
output_filename = f"{output_file_prefix}-{page.page_number}-{table_index}"
# Write Dataframe to CSV file
df.to_csv(f"{output_filename}.csv", index=False)
# Write Dataframe to HTML file
df.to_html(f"{output_filename}.html", index=False)
# Write Dataframe to Markdown file
df.to_markdown(f"{output_filename}.md", index=False)
I intend to convert the document object produced by a Document AI processor into csv format.
Reference: https://cloud.google.com/document-ai/docs/samples/documentai-toolbox-table#code-sample
- Adding print statements below each conversion to different file type, none got executed
- using sample ‘document.json’ objects found on google cloud docs, output still remains the same
OfficeSupplySA is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.