How to construct a robust XML parser with error handling in XPath 3.1 and XSLT

My use case: i want to analyze a large XML Document which contains elements named ownedComment. Each of these Elements has an attribute called body. The content of this attribute should be a string, which is a serialized XML Document Fragment. An example would be

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code><ownedComment body="<p>This is a <i>comment</i><p>"/>
</code>
<code><ownedComment body="<p>This is a <i>comment</i><p>"/> </code>
<ownedComment body="<p>This is a <i>comment</i><p>"/>

An additional complication is that the serialized document references XML entities that are defined externally elsewhere (in a file).

I am using XSLT 3 and XPath 3.1, environment is SAXON EE within Oxygen. I have successfully created a function called uml:documentation-parser that creates a parser for a particular entity definition file. It uses the closure technique (for the entity definition) and higher order functions, since it returns a function with signature function (element()) as element()*. The semantic of this function is: For a given Element $e return the content of its $e/ownedComment/@body parsed as an XML Document, taken into account the entity definitions from a particular file. The outline of this function is given below:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code> <xsl:function name="uml:documentation-parser" as="function (element()) as element()*">
<xsl:param name="entity-file" as="xs:string?"/>
<xsl:sequence select="
let $doctype := if ($entity-file) then
'<!DOCTYPE root [<!ENTITY % entities SYSTEM ''' || $entity-file || '''>%entities;]>'
else
'',
$prolog := '<root xmlns=''http://docbook.org/ns/docbook''>',
$epilog := '</root>'
return
function ($element as element(*)) as element()* {
let $text := $element/ownedComment/@body
return
if ($text) then
(concat($doctype, $prolog, $text, $epilog) => parse-xml())/*/*
else
()
}"/>
</xsl:function>
</code>
<code> <xsl:function name="uml:documentation-parser" as="function (element()) as element()*"> <xsl:param name="entity-file" as="xs:string?"/> <xsl:sequence select=" let $doctype := if ($entity-file) then '<!DOCTYPE root [<!ENTITY % entities SYSTEM ''' || $entity-file || '''>%entities;]>' else '', $prolog := '<root xmlns=''http://docbook.org/ns/docbook''>', $epilog := '</root>' return function ($element as element(*)) as element()* { let $text := $element/ownedComment/@body return if ($text) then (concat($doctype, $prolog, $text, $epilog) => parse-xml())/*/* else () }"/> </xsl:function> </code>
  <xsl:function name="uml:documentation-parser" as="function (element()) as element()*">
        <xsl:param name="entity-file" as="xs:string?"/>
        <xsl:sequence select="
                let $doctype := if ($entity-file) then
                    '<!DOCTYPE root [<!ENTITY % entities SYSTEM ''' || $entity-file || '''>%entities;]>'
                else
                    '',
                    $prolog := '<root xmlns=''http://docbook.org/ns/docbook''>',
                    $epilog := '</root>'
                return
                    function ($element as element(*)) as element()* {
                        let $text := $element/ownedComment/@body
                        return
                            if ($text) then
                                (concat($doctype, $prolog, $text, $epilog) => parse-xml())/*/*
                            else
                                ()
                    }"/>
    </xsl:function>

$prolog and $epilog are needed, because the Document Fragment in @body may contain more than on serialized XML Elements. They guarantee that there is always a single root element, and set the namespace.

This is very well when the string within @body can be parsed as an XML Document. But the parse-xml() function may raise a dynamic error err:FODC0006 if the content is not a well-formed and namespace-well-formed XML document.

That’s why i would like to change the signature of the returned function (the parser) to function (element()) as map(). The idea is that the parser should never raise an error, but always return a map with these entries:

  • text: always the original (unparsed) text. A string.
  • xml: in case of of wellformed content, the result of parsing as above. A Sequence of elements in the docBook namespace. Absent if parse-xml did not suceed.
  • err: in case of parsing error, the error that was raised by parse-xml()

My problem is, that there is no try/catch mechanism in XPath. It’s a feature of XSLT.

My question is: is there any way in the combination of XPATH and XSLT to construct a robust XML parser as a result of o higher order function, that is able to catch dynamic errors?

Thanks in advance,
Frank Steimke

2

I’ve written a stylesheet based on your code, and added an auxiliary function uml:parse-xml-robustly, which returns a map as you specified, and I changed your existing function so that it uses this new function in place of parse-xml(), and extracts the parsed XML (if any) from the map which it returns.

It wasn’t entirely clear what you wanted to do about errors. You said you wanted your robust parser to return a map whose error key would be associated with the error value. So I chose to return an error in the form of another map, with keys code, description, and value, all with string values.

If the map returned by uml:parse-xml-robustly() doesn’t contain the parsed XML, then I use another auxiliary function to return the error map from that map in the form of an element (because the function is declared to return an element).

As a test I added a template to match an element called element and invoke the function.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code><xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:uml="https://example.com/uml"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:err="http://www.w3.org/2005/xqt-errors"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
exclude-result-prefixes="#all">
<xsl:output method="xml" indent="yes"/>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="element">
<xsl:variable name="parser" select="uml:documentation-parser(())"/>
<xsl:copy>
<xsl:sequence select="$parser(.)"/>
</xsl:copy>
</xsl:template>
<xsl:function name="uml:parse-xml-robustly" as="map(*)">
<xsl:param name="text" as="xs:string"/>
<xsl:try>
<xsl:sequence select="
map{
'text': $text,
'xml': parse-xml($text),
'error': ()
}
"/>
<xsl:catch select="
map{
'text': $text,
'xml': (),
'error': map{
'code': 'Q{' || namespace-uri-from-QName($err:code) || '}' || local-name-from-QName($err:code),
'description': $err:description,
'value': $err:value
}
}
"/>
</xsl:try>
</xsl:function>
<xsl:function name="uml:error-as-element" as="element(error)">
<xsl:param name="error" as="map(*)"/>
<error>
<xsl:for-each select="map:keys($error)">
<xsl:attribute name="{.}" select="$error(.)"/>
</xsl:for-each>
</error>
</xsl:function>
<xsl:function name="uml:documentation-parser" as="function (element()) as element()*">
<xsl:param name="entity-file" as="xs:string?"/>
<xsl:sequence select="
let $doctype := if ($entity-file) then
'<!DOCTYPE root [<!ENTITY % entities SYSTEM ''' || $entity-file || '''>%entities;]>'
else
'',
$prolog := '<root xmlns=''http://docbook.org/ns/docbook''>',
$epilog := '</root>'
return
function ($element as element(*)) as element()* {
let $text := $element/ownedComment/@body
return
if ($text) then
let $result :=
concat($doctype, $prolog, $text, $epilog) => uml:parse-xml-robustly()
return
($result('xml')/*/*, $result('error')!uml:error-as-element(.))[1]
else
()
}"/>
</xsl:function>
</xsl:stylesheet>
</code>
<code><xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:uml="https://example.com/uml" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:err="http://www.w3.org/2005/xqt-errors" xmlns:map="http://www.w3.org/2005/xpath-functions/map" exclude-result-prefixes="#all"> <xsl:output method="xml" indent="yes"/> <xsl:mode on-no-match="shallow-copy"/> <xsl:template match="element"> <xsl:variable name="parser" select="uml:documentation-parser(())"/> <xsl:copy> <xsl:sequence select="$parser(.)"/> </xsl:copy> </xsl:template> <xsl:function name="uml:parse-xml-robustly" as="map(*)"> <xsl:param name="text" as="xs:string"/> <xsl:try> <xsl:sequence select=" map{ 'text': $text, 'xml': parse-xml($text), 'error': () } "/> <xsl:catch select=" map{ 'text': $text, 'xml': (), 'error': map{ 'code': 'Q{' || namespace-uri-from-QName($err:code) || '}' || local-name-from-QName($err:code), 'description': $err:description, 'value': $err:value } } "/> </xsl:try> </xsl:function> <xsl:function name="uml:error-as-element" as="element(error)"> <xsl:param name="error" as="map(*)"/> <error> <xsl:for-each select="map:keys($error)"> <xsl:attribute name="{.}" select="$error(.)"/> </xsl:for-each> </error> </xsl:function> <xsl:function name="uml:documentation-parser" as="function (element()) as element()*"> <xsl:param name="entity-file" as="xs:string?"/> <xsl:sequence select=" let $doctype := if ($entity-file) then '<!DOCTYPE root [<!ENTITY % entities SYSTEM ''' || $entity-file || '''>%entities;]>' else '', $prolog := '<root xmlns=''http://docbook.org/ns/docbook''>', $epilog := '</root>' return function ($element as element(*)) as element()* { let $text := $element/ownedComment/@body return if ($text) then let $result := concat($doctype, $prolog, $text, $epilog) => uml:parse-xml-robustly() return ($result('xml')/*/*, $result('error')!uml:error-as-element(.))[1] else () }"/> </xsl:function> </xsl:stylesheet> </code>
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:uml="https://example.com/uml"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:err="http://www.w3.org/2005/xqt-errors"
  xmlns:map="http://www.w3.org/2005/xpath-functions/map"
  exclude-result-prefixes="#all">
  
  <xsl:output method="xml" indent="yes"/>
  <xsl:mode on-no-match="shallow-copy"/>
  
  <xsl:template match="element">
    <xsl:variable name="parser" select="uml:documentation-parser(())"/>
    <xsl:copy>
      <xsl:sequence select="$parser(.)"/>
    </xsl:copy>
  </xsl:template>
  
  <xsl:function name="uml:parse-xml-robustly" as="map(*)">
    <xsl:param name="text" as="xs:string"/>
    <xsl:try>
      <xsl:sequence select="
        map{
          'text': $text,
          'xml': parse-xml($text),
          'error': ()
        }
      "/>
      <xsl:catch select="
        map{
          'text': $text,
          'xml': (),
          'error': map{
            'code': 'Q{' || namespace-uri-from-QName($err:code) || '}' || local-name-from-QName($err:code),
            'description': $err:description,
            'value': $err:value
          }
        }
      "/>
    </xsl:try>
  </xsl:function>
  
  <xsl:function name="uml:error-as-element" as="element(error)">
    <xsl:param name="error" as="map(*)"/>
    <error>
      <xsl:for-each select="map:keys($error)">
        <xsl:attribute name="{.}" select="$error(.)"/>
      </xsl:for-each>
    </error>
  </xsl:function>
  
  <xsl:function name="uml:documentation-parser" as="function (element()) as element()*">
    <xsl:param name="entity-file" as="xs:string?"/>
    <xsl:sequence select="
      let $doctype := if ($entity-file) then
          '<!DOCTYPE root [<!ENTITY % entities SYSTEM ''' || $entity-file || '''>%entities;]>'
      else
          '',
          $prolog := '<root xmlns=''http://docbook.org/ns/docbook''>',
          $epilog := '</root>'
      return
          function ($element as element(*)) as element()* {
              let $text := $element/ownedComment/@body
              return
                  if ($text) then
                    let $result := 
                      concat($doctype, $prolog, $text, $epilog) => uml:parse-xml-robustly()
                    return
                      ($result('xml')/*/*, $result('error')!uml:error-as-element(.))[1]
                  else
                      ()
          }"/>
    </xsl:function>
</xsl:stylesheet>

Test document:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code><root>
<element>
<ownedComment body="<p>This is a <i>comment</i></p>"/>
</element>
<element>
<ownedComment body="<p>This is a <i>comment</i><p>"/>
</element>
</root>
</code>
<code><root> <element> <ownedComment body="<p>This is a <i>comment</i></p>"/> </element> <element> <ownedComment body="<p>This is a <i>comment</i><p>"/> </element> </root> </code>
<root>
  <element>
    <ownedComment body="<p>This is a <i>comment</i></p>"/>
  </element>
  <element>
    <ownedComment body="<p>This is a <i>comment</i><p>"/>
  </element>
</root>

Result:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code><root>
<element>
<p xmlns="http://docbook.org/ns/docbook">This is a <i>comment</i>
</p>
</element>
<element>
<error code="Q{http://www.w3.org/2005/xqt-errors}FODC0006"
value="org.xml.sax.SAXParseException; systemId: urn:from-string; lineNumber: 1; columnNumber: 77; The element type "p" must be terminated by the matching end-tag "</p>"."
description="First argument to parse-xml() is not a well-formed and namespace-well-formed XML document. org.xml.sax.SAXParseException; systemId: urn:from-string; lineNumber: 1; columnNumber: 77; The element type "p" must be terminated by the matching end-tag "</p>".The element type "p" must be terminated by the matching end-tag "</p>"."/>
</element>
</root>
</code>
<code><root> <element> <p xmlns="http://docbook.org/ns/docbook">This is a <i>comment</i> </p> </element> <element> <error code="Q{http://www.w3.org/2005/xqt-errors}FODC0006" value="org.xml.sax.SAXParseException; systemId: urn:from-string; lineNumber: 1; columnNumber: 77; The element type "p" must be terminated by the matching end-tag "</p>"." description="First argument to parse-xml() is not a well-formed and namespace-well-formed XML document. org.xml.sax.SAXParseException; systemId: urn:from-string; lineNumber: 1; columnNumber: 77; The element type "p" must be terminated by the matching end-tag "</p>".The element type "p" must be terminated by the matching end-tag "</p>"."/> </element> </root> </code>
<root>
   <element>
      <p xmlns="http://docbook.org/ns/docbook">This is a <i>comment</i>
      </p>
   </element>
   <element>
      <error code="Q{http://www.w3.org/2005/xqt-errors}FODC0006"
             value="org.xml.sax.SAXParseException; systemId: urn:from-string; lineNumber: 1; columnNumber: 77; The element type "p" must be terminated by the matching end-tag "</p>"."
             description="First argument to parse-xml() is not a well-formed and namespace-well-formed XML document. org.xml.sax.SAXParseException; systemId: urn:from-string; lineNumber: 1; columnNumber: 77; The element type "p" must be terminated by the matching end-tag "</p>".The element type "p" must be terminated by the matching end-tag "</p>"."/>
   </element>
</root>

1

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật