Thiết kế website giá rẻ

Question

I’m working on a C# project that involves parsing WordPerfect documents to identify and process nested forms. The program is called WP_Mapper, and it uses the WP_Reader library to parse WordPerfect documents in the WP6x format. The goal is to create a dependency map of parent/child links between different forms. I’ve attached Git links to both my project and the WP_Reader project I am using.

Program Overview

The user selects an initial WordPerfect file to map.
The program parses the document and looks for any nested forms using the tags.
If nested forms are found, the program recursively parses those forms and builds a visual tree structure showing the dependencies.

My Code (also on git using the link above):

<code>using System;

using System.IO;

using System.Linq;

using System.Text.RegularExpressions;

using System.Windows.Forms;

using WP_Reader;

namespace WP_Mapper

{

public partial class Form1 : Form

{

public Form1()

{

InitializeComponent();

}

private void btnSelectFile_Click(object sender, EventArgs e)

{

using (OpenFileDialog openFileDialog = new OpenFileDialog())

{

openFileDialog.Filter = "WordPerfect files (*.wpd;*.frm)|*.wpd;*.frm|All files (*.*)|*.*";

if (openFileDialog.ShowDialog() == DialogResult.OK)

{

string filePath = openFileDialog.FileName;

WP6Document doc = new WP6Document(filePath);

TreeNode rootNode = new TreeNode(filePath);

treeViewDocuments.Nodes.Add(rootNode);

ParseDocument(doc, rootNode);

}

private void treeViewDocuments_AfterSelect(object sender, TreeViewEventArgs e)

{

// Handle tree view selection event if needed

}

private void ParseDocument(WP6Document doc, TreeNode parentNode)

{

// Accumulate the entire document content into a single string

string documentContent = string.Empty;

foreach (WPToken token in doc.documentArea.WPStream)

{

if (token is WP_Reader.Character character)

{

documentContent += character.content;

}

else if (token is WP_Reader.Function function)

{

documentContent += $"<{function.name}>";

}

// Log the accumulated document content for debugging

string logFilePath = @"C:pathtooutputdocumentContent.txt";

File.WriteAllText(logFilePath, documentContent);

MessageBox.Show($"Document content logged to: {logFilePath}");

// Use a flexible regex to find <merge> tags

var mergeRegex = new Regex(@"<merge>(.*?)</merge>", RegexOptions.Singleline);

var matches = mergeRegex.Matches(documentContent);

if (matches.Count == 0)

{

MessageBox.Show("No <merge> tags found.");

return;

}

foreach (Match match in matches)

{

string nestedFilePath = match.Groups[1].Value;

// Debug output

Console.WriteLine($"Found nested form: {nestedFilePath}");

MessageBox.Show($"Found nested form: {nestedFilePath}");

TreeNode childNode = new TreeNode(nestedFilePath);

parentNode.Nodes.Add(childNode);

// Recursively parse the nested form

try

{

if (File.Exists(nestedFilePath))

{

WP6Document nestedDoc = new WP6Document(nestedFilePath);

ParseDocument(nestedDoc, childNode);

}

else

{

MessageBox.Show($"Nested file not found: {nestedFilePath}");

}

catch (Exception ex)

{

MessageBox.Show($"Error parsing nested document: {ex.Message}");

}

</code>

<code>using System; using System.IO; using System.Linq; using System.Text.RegularExpressions; using System.Windows.Forms; using WP_Reader; namespace WP_Mapper { public partial class Form1 : Form { public Form1() { InitializeComponent(); } private void btnSelectFile_Click(object sender, EventArgs e) { using (OpenFileDialog openFileDialog = new OpenFileDialog()) { openFileDialog.Filter = "WordPerfect files (*.wpd;*.frm)|*.wpd;*.frm|All files (*.*)|*.*"; if (openFileDialog.ShowDialog() == DialogResult.OK) { string filePath = openFileDialog.FileName; WP6Document doc = new WP6Document(filePath); TreeNode rootNode = new TreeNode(filePath); treeViewDocuments.Nodes.Add(rootNode); ParseDocument(doc, rootNode); } } } private void treeViewDocuments_AfterSelect(object sender, TreeViewEventArgs e) { // Handle tree view selection event if needed } private void ParseDocument(WP6Document doc, TreeNode parentNode) { // Accumulate the entire document content into a single string string documentContent = string.Empty; foreach (WPToken token in doc.documentArea.WPStream) { if (token is WP_Reader.Character character) { documentContent += character.content; } else if (token is WP_Reader.Function function) { documentContent += $"<{function.name}>"; } } // Log the accumulated document content for debugging string logFilePath = @"C:pathtooutputdocumentContent.txt"; File.WriteAllText(logFilePath, documentContent); MessageBox.Show($"Document content logged to: {logFilePath}"); // Use a flexible regex to find <merge> tags var mergeRegex = new Regex(@"<merge>(.*?)</merge>", RegexOptions.Singleline); var matches = mergeRegex.Matches(documentContent); if (matches.Count == 0) { MessageBox.Show("No <merge> tags found."); return; } foreach (Match match in matches) { string nestedFilePath = match.Groups[1].Value; // Debug output Console.WriteLine($"Found nested form: {nestedFilePath}"); MessageBox.Show($"Found nested form: {nestedFilePath}"); TreeNode childNode = new TreeNode(nestedFilePath); parentNode.Nodes.Add(childNode); // Recursively parse the nested form try { if (File.Exists(nestedFilePath)) { WP6Document nestedDoc = new WP6Document(nestedFilePath); ParseDocument(nestedDoc, childNode); } else { MessageBox.Show($"Nested file not found: {nestedFilePath}"); } } catch (Exception ex) { MessageBox.Show($"Error parsing nested document: {ex.Message}"); } } } } } </code>

using System;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
using System.Windows.Forms;
using WP_Reader;

namespace WP_Mapper
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void btnSelectFile_Click(object sender, EventArgs e)
        {
            using (OpenFileDialog openFileDialog = new OpenFileDialog())
            {
                openFileDialog.Filter = "WordPerfect files (*.wpd;*.frm)|*.wpd;*.frm|All files (*.*)|*.*";
                if (openFileDialog.ShowDialog() == DialogResult.OK)
                {
                    string filePath = openFileDialog.FileName;
                    WP6Document doc = new WP6Document(filePath);
                    TreeNode rootNode = new TreeNode(filePath);
                    treeViewDocuments.Nodes.Add(rootNode);
                    ParseDocument(doc, rootNode);
                }
            }
        }

        private void treeViewDocuments_AfterSelect(object sender, TreeViewEventArgs e)
        {
            // Handle tree view selection event if needed
        }

        private void ParseDocument(WP6Document doc, TreeNode parentNode)
        {
            // Accumulate the entire document content into a single string
            string documentContent = string.Empty;

            foreach (WPToken token in doc.documentArea.WPStream)
            {
                if (token is WP_Reader.Character character)
                {
                    documentContent += character.content;
                }
                else if (token is WP_Reader.Function function)
                {
                    documentContent += $"<{function.name}>";
                }
            }

            // Log the accumulated document content for debugging
            string logFilePath = @"C:pathtooutputdocumentContent.txt";
            File.WriteAllText(logFilePath, documentContent);
            MessageBox.Show($"Document content logged to: {logFilePath}");

            // Use a flexible regex to find <merge> tags
            var mergeRegex = new Regex(@"<merge>(.*?)</merge>", RegexOptions.Singleline);
            var matches = mergeRegex.Matches(documentContent);

            if (matches.Count == 0)
            {
                MessageBox.Show("No <merge> tags found.");
                return;
            }

            foreach (Match match in matches)
            {
                string nestedFilePath = match.Groups[1].Value;

                // Debug output
                Console.WriteLine($"Found nested form: {nestedFilePath}");
                MessageBox.Show($"Found nested form: {nestedFilePath}");

                TreeNode childNode = new TreeNode(nestedFilePath);
                parentNode.Nodes.Add(childNode);

                // Recursively parse the nested form
                try
                {
                    if (File.Exists(nestedFilePath))
                    {
                        WP6Document nestedDoc = new WP6Document(nestedFilePath);
                        ParseDocument(nestedDoc, childNode);
                    }
                    else
                    {
                        MessageBox.Show($"Nested file not found: {nestedFilePath}");
                    }
                }
                catch (Exception ex)
                {
                    MessageBox.Show($"Error parsing nested document: {ex.Message}");
                }
            }
        }
    }
}

The raw parsed text that WP_Reader gets from the initial selected file is below. In it, you can clearly see the part that has [File Path of Nested Form That I Want To Map]. Ideally, if there were three levels of nested forms (document 1 has document 2 nested in it, and document 2 has document 3 nested in it), then it would parse each one and create the parent/child node graph. I just for the life of me cannot get it to find the second file using the parsed text:

<code><global_on><set_language><global_off><check_as_you_go>TEST<hard_eol><hard_eol><check_as_you_go><merge>C:UsersdylanDesktopTEST1A.frm<merge><check_as_you_go><hard_eol><hard_eol>TEST<left_tab>

</code>

<code><global_on><set_language><global_off><check_as_you_go>TEST<hard_eol><hard_eol><check_as_you_go><merge>C:UsersdylanDesktopTEST1A.frm<merge><check_as_you_go><hard_eol><hard_eol>TEST<left_tab> </code>

<global_on><set_language><global_off><check_as_you_go>TEST<hard_eol><hard_eol><check_as_you_go><merge>C:UsersdylanDesktopTEST1A.frm<merge><check_as_you_go><hard_eol><hard_eol>TEST<left_tab>

If anyone can help it would be much appreciated, I feel like I am missing something right in front of my face and it is driving me up a wall.

Confirmed that WP_Reader successfully parses the document and the tags with file paths are present in the raw parsed text.
Logged the raw parsed text to a file and visually inspected it. The tag and file path don’t appear to have any major formatting issues that would be an obvious issue.
Simplified the regex pattern to “([^<]+)” to make it more flexible and less restrictive. Updated regex pattern to handle potential variations in tag structure and whitespace: <mergesr=””([^””]+)””s/?>**. Simplified regex to match the exact structure of your example output: **(.?)</merge>**.
Attempted to exclude irrelevant tags and only accumulate meaningful content.
Identified issues with unexpected tags like <soft_space> and <left_tab>. Refined the accumulation logic to exclude these tags and focus on tags.
Tried to get both Google Gemini and ChatGPT to figure it out with no avail.

Thiết kế website giá rẻ

Danh mục

Issues Parsing Nested WordPerfect Forms with C# and WP_Reader