I have a docx and would like to extract the whole file. In the file, it contains this format:
1.Hypothesis
1.1 lots of random text here
1.5 more random text here
1.6 [random table]
- Procedure
2.1 ramdom text here
2.2 more random text here
2.5 moreeee random text here
When I run my code below, it prints out the paragraphs and contents of the tables but not the sub-bullets. How can i print out the sub-bullets in my code?
Below is my code:
file_path = r’C:/insert path here/’
import docx
def read_docx(file_path):
doc = docx.Document(file_path)
# Initialize lists to hold paragraph and table data
paragraphs = []
tables = []
# Iterate over paragraphs
for paragraph in doc.paragraphs:
paragraphs.append(paragraph.text)
# Iterate over tables
for table in doc.tables:
table_data = []
for row in table.rows:
row_data = []
for cell in row.cells:
row_data.append(cell.text)
table_data.append(row_data)
tables.append(table_data)
return paragraphs, tables
paragraphs, tables = read_docx(file_path)
print(“Paragraphs:”)
for paragraph in paragraphs:
print(paragraph)
print(“nTables:”)
for table in tables:
for row in table:
print(row)
Im stuck, please help