When I directly convert the word to txt it doesn’t encode pilcrows or check box answers and I end up losing that information. I’m trying to get the same result as when I convert it to txt within the word document, but with the pilicrows and check box answers. ChatGPT gave me this code as an example, but the result txt file only has a few paragraphs from the word document and it is missing most of the information from when I directly convert the world file to txt.
from pathlib import Path
from docx import Document
def convert_docx_to_txt(input_file, output_file):
# Load the Word document
doc = Document(input_file)
# Open the output file in write mode
with open(output_file, 'w', encoding='utf-8') as txt_file:
# Iterate through each paragraph in the document
for para in doc.paragraphs:
text = para.text
# Replace form checkboxes with "Yes" or "No"
# Note: The symbol for checkboxes might be different in some documents
if '☒' in text: # Checked box (you may need to adjust this symbol)
text = text.replace('☒', 'Yes')
if '☐' in text: # Unchecked box
text = text.replace('☐', 'No')
# Write the processed text to the file
txt_file.write(text + 'n')
Is this something I should even be doing with Python?
4