Four-way XML comparison in C#

I have 4 XML files: A, B, C, and D. I want to know if the difference between A and B is the same as the difference between C and D.

The XML files are serializations of the same .NET object; one of the primary differences will be in a particular list that describes the features available on a particular product. (A description of the feature is itself another object).

All four have very similar structures, but there may be values present in one that aren’t present in another, and some values may be changed. For example, if we consider document A:

<xmldoc>
   <a></a>
   <c></c>
   <d></d>
<xmldoc>

Document B:

<xmldoc>
   <a></a>
   <b></b> -- Added 
   <c></c> -- C and D are still ordered in the same way (except for the addition of <b>
   <d></d>
   <e></e> -- Also added, but it doesn't affect the sort of the other ones
<xmldoc>

Now suppose that I have the following documents. Document C is exactly identical to document A:

<xmldoc>
   <a></a>
   <c></c>
   <d></d>
<xmldoc>

Document D is identical to document B.

Since the difference between C and D is exactly the same as the difference between A and B, this should pass. However, suppose that instead we have document D as follows:

<xmldoc>
   <a></a>
   <b></b> 
   <f></f> <!-- Added -->
   <c></c>
   <d></d>
   <e></e>
   <f></f>
<xmldoc>

The difference between C and D is no longer the same as the difference between A and B.

I’m pretty sure that we won’t have a case where document A shows up as:

<xmldoc>
   <c></c>
   <a></a> -- This is the same as the original document A except that this was reordered - this shouldn't happen
   <d></d>
<xmldoc>

My first thought was to use Microsoft’s XML Diff Patch library, which compares two files and generates a DiffGram, which is an XML document that describes the difference between the two files being compared. My thought is that I could compare A to B to get DiffGram X and C to D to get DiffGram Y, and then do a third XML comparison between X and Y.

The idea sounds good on paper; unfortunately it’s not turning out to be so simple. The difference between A and B is very similar to the difference between C and D, but X and Y look nothing like each other.

The problem is it gives DiffGrams like the following:

<xd:node match="4">
           <xd:node match="2">
              <xd:node match="1">
                 <xd:remove match="1-3" />
              </xd:node>
           </xd:node>

           <xd:node match="1">
              <xd:node match="1">
                 <xd:remove match="1-3" />
              </xd:node>
           </xd:node>
        </xd:node>

This has two problems: first, it’s extremely cryptic – I’d prefer it if it was more human-readable, but it’s not the end of the world if that’s not the case (since my primary purpose is programmatic here). Secondly (and much more critically), it seems like that’s very tightly coupled to the specific XML files that are in that particular comparison.

I originally posted on the Software Recommendation Stack Exchange asking for recommendations for a .NET library (preferably a available as a NuGet package) that would be suitable for this purpose but didn’t have much luck getting a recommendation. (Full disclosure: I haven’t deleted that question yet but intend to do so shortly). If such a library exists, I haven’t been able to find it (a lot of them seem like they’re not designed for the purpose I want to use them for and/or aren’t written for the .NET framework), but if anyone’s aware of such a library that would definitely be an acceptable solution as well (in fafct, I would strongly prefer that to having to implement it myself).

Has anyone successfully done something like this (either by creating your own solution, using Microsoft’s XML Diff library, or using another third-party library)? If so, what did you do?

I’m hoping that this isn’t too broad of a question (if so let me know and I’ll edit), but what would be a good approach to this if I end up writing this myself?

5

My thought is that I could compare A to B to get DiffGram X and C to D to get DiffGram Y, and then do a third XML comparison between X and Y.

That seems to be a good start. I guess what is missing here is something like a program or xslt script to transform “DiffGram X” to a readable representation X’. Then you can apply the same transformation to Diffgram Y, leading to a readable Y’. Comparing X’ and Y’ gives you a final DiffGram Z, which might be transformed to a readable Z’.

How this script or program will loook like probably depends on what kind of assumptions you can make about the structure of the input files. Do they really consist of arbitrary nested XML trees? Do you need to compare attributes, name space differences elements and element texts as well? I would be astonished if one cannot use that knowledge to simplify the DiffGrams.

4

The DiffGram representation of changes does not work well for this situation. It is fine for patching files but not really for this type of application. Using DeltaXML gives a more useful representation of the differences between your A and B docs:

<xmldoc deltaxml:deltaV2="A!=B" deltaxml:version="2.0" deltaxml:content-type="full-context" xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1">
 <a deltaxml:deltaV2="A=B" />
 <b deltaxml:deltaV2="B" />
 <c deltaxml:deltaV2="A=B" />
 <d deltaxml:deltaV2="A=B" />
 <e deltaxml:deltaV2="B" />
</xmldoc>

Then you would get something very similar for your second comparison, C to D where C is like A but D has an added element (note we have called these A and B here so we get a result as near to the first result as we can):

<xmldoc deltaxml:deltaV2="A!=B" deltaxml:version="2.0" deltaxml:content-type="full-context" xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1">
 <a deltaxml:deltaV2="A=B" />
 <b deltaxml:deltaV2="B" />
 <f deltaxml:deltaV2="B" />
 <c deltaxml:deltaV2="A=B" />
 <d deltaxml:deltaV2="A=B" />
 <e deltaxml:deltaV2="B" />
</xmldoc>

This is basic two-way comparison – which is available for .NET. As you see, you could compare these two results and get a useful diff (some namespace changes would need to be made so the delta files were treated as regular files).

It is also possible using XML merge (though this is Java only) to go one stage better and show all three files in one. As A is the same as C we can treat this as one, so we want to know the changes between A and B and between A and D.

<xmldoc deltaxml:deltaV2="A!=B!=D" deltaxml:version="2.0" deltaxml:content-type="full-context" xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" xmlns:dxu="http://www.deltaxml.com/ns/unified-delta-v1">
 <a deltaxml:deltaV2="A=B=D" />
 <b deltaxml:deltaV2="B=D" />
 <f deltaxml:deltaV2="D" />
 <c deltaxml:deltaV2="A=B=D" />
 <d deltaxml:deltaV2="A=B=D" />
 <e deltaxml:deltaV2="B=D" />

That is probably what you need here. You do not say what your end goal is, perhaps to make a concurrent edit style of update, i.e. merge the changes made in both edit paths. As you have found, this is quite difficult! I hope this helps.
Robin

I developed a xslt diff sheet for the purpose of comparing any two xml files using XSLT 1.0. https://github.com/sflynn1812/xslt-diff-turbo

You alter the variable at the top of the sheet to specify the file being compared against.

A practical example is below. For instance if file a.xml is compared against file b.xml:

a.xml

<?xml version="1.0" encoding="utf-8" ?>
<a>
  <b>test c</b>
  <c>
    <d>test</d>
  </c>
  <b>test</b>
  <c>
    <d>test</d>
  </c>
  <b>test</b>
  <c>
    <d>test</d>
  </c>
</a>

b.xml

<?xml version="1.0" encoding="utf-8" ?>
<a>
  <b>test 2</b>
  <c>
    <d>test</d>
  </c>
  <b>test</b>
  <c>
    <d>test</d>
  </c>
  <b>test</b>
  <c>
    <d>test</d>
  </c>
</a>

The output would be as shown below, with the mismatches in a.xml not list in b.xml within tree->mismatch. The mismatches between b.xml not
not in a.xml under compare->mismatch:

<?xml version="1.0" encoding="utf-8"?>
<root>
  <root>
    <tree>
      <mismatch>
        <a>
          <b>test 2</b>
        </a>
      </mismatch>
      <match>
        <a>
          <c>
            <d>test</d>/
          </c>
          <b>test</b>
          <c>
            <d>test</d>
          </c>
          <b>test</b>
          <c>
            <d>test</d>
          </c>
        </a>
      </match>
    </tree>
    <compare>
      <mismatch>
        <a>
          <b>test c</b>
        </a>
      </mismatch>
      <match>
        <a>
          <c>
            <d>test</d>
          </c>
          <b>test</b>
          <c>
            <d>test</d>
          </c>
          <b>test</b>
          <c>
            <d>test</d>
          </c>
        </a>
      </match>
    </compare>
  </root>
</root>

In the case of what you are trying to do you would do the difference between document A and document B, and document C and document D, then select the mismatched output of both files using xpath queries, followed by running the XSLT sheet a third time between the differences.

Just a broad answer. There is a recommendation called the XML Information Set:

https://www.w3.org/TR/xml-infoset

I’d say the most accurate way to compute the difference (or “delta”) between two XML documents, and then compare such differences themselves, will be after using whichever API/component (out of the box, augmented, or custom) supports the constructs defined in that recommendation the most faithfully.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật