I am working on a deserializer in C# for an XML file type for a program I don’t have any control over. Unfortunately, the XML file structure completely breaks conventions in two major ways, as far as I can tell, and it’s been complicating the patterns for me.
The file is used to define user interface components within a video game. There is a main XML element called “layout” that is always the root element, with two child elements – <hierarchy>
and <components>
.
<layout
version="137"
comment="Agh!">
<hierarchy>
<!-- see below -->
</hierarchy>
<components>
<!-- see below -->
</components>
</layout>
The hierarchy element contains the hierarchical structure of the UI tree defined by each layout file. It always starts with a single “root” node, and the root node can have a single child, but beneath that any node can have as many children as needed.
<hierarchy>
<root this="1234">
<main_child this="5678">
<child_a/>
<child_b/>
<child_c>
<grandchild_a/>
</child_c>
</main_child>
</root>
</hierarchy>
And the components hierarchy has a similar strange structure, with every child of the “components” node being one of two very similar types, but the actual tag of each node can vary as above.
<components>
<root this="1234" id="root">
<!-- etc... -->
</root>
<main_child
this="5678"
id="main_child"
state="active">
</main_child>
<!-- and so on, with one "primary" node for each UI component in this file -->
</components>
These make it difficult to quickly parse the child elements of both and , and I’m having trouble getting XmlSerializer to recognize any of the children for both of these sections. I can get “root” to be recognized by the hierarchy class easily, but getting that to work recursively hasn’t worked yet, and I’m having difficulty getting the array beneath to work also.
Originally, I was developing a manual converter using XDocument, but that showed to be significantly too-much work since it required unique handling of every single attribute, of which there are hundreds in this file, which can also change between “versions” of it.
I’ve been testing these out through using the various Xml attributes available as hinters for the XmlSerializer.
public class LayoutModel {
[XmlAttribute("version")]
public uint Version { get; set; }
// etc ...
[XmlElement("hierarchy")]
public HierarchyModel Hierarchy { get; set; }
[XmlElement("components"), typeof(ComponentModel))]
public ComponentModel[] Components { get; set; }
}
public class HierarchyModel {
// This converts fine, but getting the understanding of the children is not working.
[XmlElement("root")]
public HierarchyNodeModel RootNode { get; set; }
}
public class HierarchyNodeModel {
[XmlArrayItem(Type = typeof(TestHierarchyNodeModel))]
[XmlArray]
public TestHierarchyNodeModel[] ChildNodes { get; set; }
[XmlAttribute("this")]
public string GUID { get; set; }
}
public class ComponentModel {
[XmlAttribute("this")]
public string GUID { get; set; }
[XmlAttribute("id")]
public string Id { get; set; }
}
With using the above, the LayoutModel deciphers fine, and I get the Hierarchy -> Root connection, but “ChildNodes” in the Root is null, so nothing beneath it is deserialized. Likewise, the “Components” array is empty, at size 0.
Simple example XML structure for this problem at hand:
<?xml version="1.0"?>
<layout
version="137"
comment=""
precache_condition="">
<hierarchy>
<root this="2A19D461-6F9E-45F7-977F41D42D07FDB0">
<template_row_header this="F3416CFD-8BC7-4276-86996FD67D7F6A75">
<dy_title this="77E6B934-1E3B-40E5-BC14C7F643672167"/>
</template_row_header>
</root>
</hierarchy>
<components>
<root
this="2A19D461-6F9E-45F7-977F41D42D07FDB0"
id="root">
</root>
<template_row_header
this="F3416CFD-8BC7-4276-86996FD67D7F6A75"
id="template_row_header"
offset="0.00,4.00">
</template_row_header>
<dy_title
this="77E6B934-1E3B-40E5-BC14C7F643672167"
id="dy_title"
offset="0.00,0.00">
</dy_title>
</components>
</layout>
2
You could try use NewtonSoft.Json
to parse xml to jsonString, then deserilize the jsonString object.
var doc = XDocument.Load("test.xml");
var jsonString = JsonConvert.SerializeXNode(doc);
//parse to jObject
var jObject = JsonConvert.DeserializeObject<JObject>(jsonString);
var value = jObject["layout"]["hierarchy"]["root"]["template_row_header"]["dy_title"]["@this"];
This approach is very cumbersome if you have dynamic XML. As class definitions have to be done at compile time, you don’t get much flexibility at runtime.
But there’s another class for handling XMLs – XDocument
.
It does not require predefined class to deserialize into and allows more dynamic approach.
Here’s code snippet that traverses whole XML:
using System.Xml.Linq;
var rawJson = @"<?xml version=""1.0""?>
<layout
version=""137""
comment=""""
precache_condition="""">
<hierarchy>
<root this=""2A19D461-6F9E-45F7-977F41D42D07FDB0"">
<template_row_header this=""F3416CFD-8BC7-4276-86996FD67D7F6A75"">
<dy_title this=""77E6B934-1E3B-40E5-BC14C7F643672167""/>
</template_row_header>
</root>
</hierarchy>
<components>
<root
this=""2A19D461-6F9E-45F7-977F41D42D07FDB0""
id=""root"">
</root>
<template_row_header
this=""F3416CFD-8BC7-4276-86996FD67D7F6A75""
id=""template_row_header""
offset=""0.00,4.00"">
</template_row_header>
<dy_title
this=""77E6B934-1E3B-40E5-BC14C7F643672167""
id=""dy_title""
offset=""0.00,0.00"">
</dy_title>
</components>
</layout>";
var xDoc = XDocument.Parse(rawJson);
TraverseElement(xDoc.Root);
void TraverseElement(XElement node, string currentPath = "")
{
if(!node.HasElements)
Console.WriteLine($"Value for path {currentPath} is {node.Value}, first attribute value: {node.FirstAttribute?.Value}");
foreach (var item in node.Descendants())
{
TraverseElement(item, currentPath + " -> " + item.Name);
}
}
And the output is:
Value for path -> hierarchy -> root -> template_row_header -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path -> hierarchy -> root -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path -> hierarchy -> template_row_header -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path -> hierarchy -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path -> root -> template_row_header -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path -> root -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path -> template_row_header -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path -> components -> root is , first attribute value: 2A19D461-6F9E-45F7-977F41D42D07FDB0
Value for path -> components -> template_row_header is , first attribute value: F3416CFD-8BC7-4276-86996FD67D7F6A75
Value for path -> components -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
Value for path -> root is , first attribute value: 2A19D461-6F9E-45F7-977F41D42D07FDB0
Value for path -> template_row_header is , first attribute value: F3416CFD-8BC7-4276-86996FD67D7F6A75
Value for path -> dy_title is , first attribute value: 77E6B934-1E3B-40E5-BC14C7F643672167
UPDATE
Below is more examples how to get information out of XML using XDocument
class. Comments in code:
var xDoc = XDocument.Parse(rawJson);
var componentChildren = xDoc.Descendants("components").Descendants();
foreach (var child in componentChildren)
{
var tagName = child.Name;
Console.WriteLine($"tagName={tagName} which has following attributes:");
PrintAllAttributes(child);
PrintSpecificAttribute(child, "this");
}
// This is the way to access all attributes of element.
void PrintAllAttributes(XElement child)
{
var attributes = child.Attributes();
foreach (var attribute in attributes)
Console.WriteLine(attribute);
}
// And here's how you can access specific attribute and get its value.
void PrintSpecificAttribute(XElement child, string attributeName)
{
var attribute = child.Attribute(attributeName);
Console.WriteLine(attribute.Value);
}
1
In a situation where you have a sequence of child elements that have different element names but similar schemas, you can use XmlSerializer’s support for polymorphism to map the elements to a collection of C# polymorphic types. This corresponds to a sequence of <xsd:choice>
elements in an XSD schema. The basic idea is to:
-
Define some base type
TBaseType
with the properties common to all elements. (It could beobject
if there are no common properties.) -
Add derived types
TDerivedType
corresponding to each specific element namederivedElementName
, and apply[XmlType("derivedElementName")]
to each. -
In each containing type that has a sequence of
TBaseType
elements, add aList<TBaseType>
property. Then inform the serializer of all possible derived types by applying[XmlElement(typeof(TDerivedType))]
for all derived types.
Thus, for your specific XML, first define the following data model:
// The root model <layout>
[XmlType("layout"), XmlRoot("layout")]
public class LayoutModel
{
[XmlAttribute("version")]
public uint Version { get; set; }
[XmlAttribute("comment")]
public string Comment { get; set; } = "";
[XmlAttribute("precache_condition")]
public string PrecacheCondition { get; set; } = "";
[XmlElement("hierarchy")]
public LayoutContainerModel? Hierarchy { get; set; }
[XmlElement("components")]
public LayoutContainerModel? Components { get; set; }
}
// THe container model for <hierarchy> and <components>.
public class LayoutContainerModel
{
[XmlElement("root")]
public LayoutItemRoot? Root { get; set; }
// The same list of polymorphic children must appear in LayoutItemBase.Children
[XmlElement(typeof(DyTitle)),
XmlElement(typeof(TemplateRowHeader))
// Add others as required
]
public List<LayoutItemBase> Children { get; set; } = new();
}
// The type hierarchy for the polymorphic sequence of child elements
public abstract class LayoutItemBase
{
[XmlAttribute("this")]
public string? This { get; set; }
[XmlAttribute("id")]
public string? Id { get; set; }
// The same list of polymorphic children must appear in LayoutModelContainer.Children
[XmlElement(typeof(DyTitle)),
XmlElement(typeof(TemplateRowHeader))
// Add others as required
]
public List<LayoutItemBase> Children { get; set; } = new();
}
[XmlType("root")]
public class LayoutItemRoot : LayoutItemBase;
[XmlType("dy_title")]
public class DyTitle : LayoutItemBase
{
[XmlAttribute("offset")]
public string? Offset { get; set; }
}
[XmlType("template_row_header")]
public class TemplateRowHeader : LayoutItemBase
{
[XmlAttribute("offset")]
public string? Offset { get; set; }
}
Then given some static extension method like:
public static T? LoadFromFile<T>(string path, XmlSerializer? serial = null)
{
using var stream = File.OpenRead(path);
return (T?)(serial ?? new XmlSerializer(typeof(T))).Deserialize(stream);
}
You will be able to load your LayoutModel
from a file path
as follows:
var model = XmlExtensions.LoadFromFile<LayoutModel>(path);
Notes:
-
If the polymorphic element names differ from collection to collection, you can apply
[XmlElement(string? derivedElementName, Type? type)]
to the collection property, instead of[XmlEoot(derivedElementName)]
to the derived type, to specify both the polymorphic names and types on a per-collection basis. -
I chose to use the same C# type for identical elements. Thus your
HierarchyNodeModel
andComponentNodeModel
were replaced with a singleLayoutContainerModel
. -
If you were provided an XSD for your XML file, then
xsd.exe
will construct a polymorphic type hierarchy for you as long as the XSD contains<xsd:choice>
elements.xsd.exe
will not, however, infer a polymorphic type hierarchy from some sample XML file (and will sometimes infer an incorrect type hierarchy when presented with polymorphic XML.) -
For more on XmlSerializer polymorphism, see
- Using XmlSerializer to serialize derived classes
- Keep sort when deserialize and serialize XML using XmlSerializer
Demo fiddle here.