This page describes implementation details of XML Schema (XSD) 1.1 assertions feature, in Xerces-J XML Schema processor.

Assertions is a new XML validation facility introduced in XSD 1.1 language. XSD 1.1 assertions allow us to constrain XML elements and attributes in useful ways, which results in a more effective XSD language. XSD Assertions are officially described here, in the W3C spec, http://www.w3.org/TR/xmlschema11-1/#cAssertions & http://www.w3.org/TR/xmlschema11-2/#rf-assertions. For an explanation of assertions feature and to know number of practical examples, please read the article [2] in "References" section below.

Status of assertions development, in Xerces-J:
The assertions implementation in Xerces-J is complete and Xerces team has checked in the code for this, at the SVN repository (https://svn.apache.org/repos/asf/xerces/java/branches/xml-schema-1.1-dev/).

The user feedback would be helpful at this point.

XSD 1.1 assertions grammar notes (as defined in XSD 1.1 spec)

Assertions usage in Complex Types:

  • 1. Complex Type Definition Schema Components <complexType
    • abstract = boolean : false block = (#all | List of (extension | restriction)) final = (#all | List of (extension | restriction)) id = ID mixed = boolean name = NCName defaultAttributesApply = boolean : true {any attributes with non-schema namespace . . .}> Content: (annotation?, (simpleContent | complexContent | (openContent?, (group | all | choice | sequence)?, ((attribute | attributeGroup)*, anyAttribute?), assert*)))
      </complexType> 2. Complex Types with Simple Content <simpleContent
    • id = ID {any attributes with non-schema namespace . . .}> Content: (annotation?, (restriction | extension))
      </simpleContent> <restriction
    • base = QName id = ID {any attributes with non-schema namespace . . .}> Content: (annotation?, (simpleType?, (minExclusive | minInclusive | maxExclusive | maxInclusive | totalDigits | fractionDigits | maxScale | minScale | length | minLength | maxLength | enumeration | whiteSpace | pattern | assertion | {any with namespace: ##other})*)?, ((attribute | attributeGroup)*, anyAttribute?), assert*)
      </restriction> <extension
    • base = QName id = ID {any attributes with non-schema namespace . . .}> Content: (annotation?, ((attribute | attributeGroup)*, anyAttribute?), assert*)
      </extension> 3. Complex Types with Complex Content <complexContent
    • id = ID mixed = boolean {any attributes with non-schema namespace . . .}> Content: (annotation?, (restriction | extension))
      </complexContent> <restriction
    • base = QName id = ID {any attributes with non-schema namespace . . .}> Content: (annotation?, openContent?, (group | all | choice | sequence)?, ((attribute | attributeGroup)*, anyAttribute?), assert*)
      </restriction> <extension
    • base = QName id = ID {any attributes with non-schema namespace . . .}> Content: (annotation?, openContent?, ((group | all | choice | sequence)?, ((attribute | attributeGroup)*, anyAttribute?), assert*))
      </extension>

Assertions usage in Simple Types:

<simpleType

  • final = (#all | List of (list | union | restriction | extension)) id = ID name = NCName {any attributes with non-schema namespace . . .}> Content: (annotation?, (restriction | list | union))

</simpleType>

<restriction

  • base = QName id = ID {any attributes with non-schema namespace . . .}> Content: (annotation?, (simpleType?, (minExclusive | minInclusive | maxExclusive | maxInclusive | totalDigits | fractionDigits | maxScale | minScale | length | minLength | maxLength | enumeration | whiteSpace | pattern | assertion | {any with namespace: ##other})*))

</restriction>

Assertions design approach, within Xerces-J

The Xerces XSModel (an object model representing an XML Schema) is modified to handle assertions, for the XML Schema 1.1 Simple and Complex types.

Since "assertion" is also now a new facet (a constraining facet) in XSD 1.1, for XML Schema Simple types, the current facet data structures, and the facet traverser (traverser is a mechanism used by Xerces to populate XSModel, from XSD syntax) are enhanced to support XML Schema 1.1 assertions.

To implement assertions for Complex types, the Complex type traverser of Xerces is modified.

Assertions XPath 2.0 processing implementation

Xerces provides a Java interface API, for integrating Xerces with XPath 2.0 evaluation engines. It should be possible to plug in any compliant XPath 2.0 engine to the Xerces assertions processor, by writing a small Java layer complying to this interface.

Xerces provides a sample implementation of this interface for Eclipse XPath2/PsychoPath XPath 2.0 engine. This should be very good enough, for open source Xerces-J product.

We have tried our best, to write an efficient implementation of XPath 2.0 integration for assertions. For nested elements having assertions, the child/descendant elements share the XPath (XDM) tree with the top most element. For each distinct XDM tree involved in assertions processing, we keep a reference of it's root node, and discard these references when assertions evaluation is complete for that tree.

A caution for Schema authors: If we are specifying assertions for the root element of the XML document, a tree will be built for assertions processing, which could be as large as the XML document. Where memory is a concern for large XML documents, assertions may be specified at an inner level [1] of the XML documents, as far as possible. This [1] will cause smaller assertion trees to be built, thereby reducing memory footprint of the XML application. It is also worth knowing, that assertions trees are generally smaller than corresponding generic XML infoset trees, because assertions trees only contain element and attribute information, along with certain XML Schema type annotations, whereas generic XML trees contain other information like processing instructions, comments and so on.

Run-time notes

  1. By default, the Xerces-J Schema processor would select the Eclipse/PsychoPath engine for XPath 2.0 processing. Xerces-J has a provision for the user, to select between different XPath 2.0 engines for evaluating assertions. This can be done by setting the Java system property, org.apache.xerces.assertProcessor. Xerces-J has an out of the box support for PsychoPath XPath 2.0 engine. If the user doesn't set this property, or set it's value to, org.apache.xerces.impl.xs.XMLAssertPsychopathImpl, the PsychoPath processor would be selected for XPath 2.0 processing.
  2. If user's wish to use an XPath 2.0 engine other than PsychoPath, for working with Xerces-J assertions, they need to write a custom Java layer similar to org.apache.xerces.impl.xs.XMLAssertPsychopathImpl, and integrate it with their own XPath 2.0 engine, using the XPath 2.0 engine's APIs. Xerces-J would publish an API, to integrate to any available standard compliant XPath 2.0 engines.
  3. Error codes, and descriptions:

Xerces-J flags error messages during assertions processing, as recommended by the XML Schema 1.1 specification. Here's the summary of error codes/messages, used by Xerces during assertion processing:

  1. cvc-assertion.3.13.4.1 -> Xerces uses this error code, with a description like following "test.xml:3:11:cvc-assertion.3.13.4.1: Assertion evaluation (x = 'hello world') for element 'Example' with type '#anonymous' did not succeed.", when any of the assertions evaluate to false.
  2. cvc-assertion.3.13.4.1 -> Here's an example of error description, when an assertion facet on a attribute value, evaluates to false: "test.xml:2:18:cvc-assertion.3.13.4.1: Assertion evaluation ('$value eq 2') for element 'x (attribute => a)' with type '#anonymous' did not succeed.".
  3. cvc-assertion.4.3.15.3 -> This error code is used, with a description like following "test.xml:2:15:cvc-assertion.4.3.15.3: Assertion evaluation (. = 'hello') for element 'x' with type '#anonymous' did not succeed (undefined context).", when an attempt is made to access the XPath context (for e.g, with the expression "."), during assertion facet evaluation.
  4. cvc-xpath.3.13.4.2 -> This error code is used, with a description like following "cvc-xpath.3.13.4.2: Assertion XPath expression, (. === 'hello') on the Schema type '#AnonType_xExample' couldn't compile successfully.", when a syntactically wrong XPath expression, is provided for assertion evaluation.
  5. Xerces allows schema authors to specify user-defined error messages, to be produced during assertions failures. Here's an example illustrating this feature:

XML document [3]:

<test>3</test>

XSD 1.1 schema document [4]:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  • <xs:element name="test">
    • <xs:simpleType>
      • <xs:restriction base="xs:positiveInteger" xmlns:xerces="http://xerces.apache.org">
        • <xs:assertion test="$value mod 2 = 0" xerces:message="Value of element 'test' must be divisible by 2" />
          </xs:restriction>
          </xs:simpleType>
          </xs:element>

</xs:schema>

When the above XML document [3], is validated with the corresponding schema document [4], following error message is displayed:

"test.xml:1:15: cvc-assertion.failure: Assertion failure. Value of element 'test' must be divisible by 2."

Notes:

  1. The user-defined error messages on assertions needs to be specified as a value, of attribute "message" (which needs to be in the XML namespace, "http://xerces.apache.org") on xs:assert/xs:assertion instruction.
  2. The mechanism of specifying user-defined error messages on XSD 1.1 assertions in Xerces-J may change in future, if XSD WG suggests a specific guidance in the final XSD 1.1 language recommendation, regarding this.

References:

  1. An introduction to XML Schema 1.1, An overview of the key improvements over XML Schema 1.0 and an in-depth look at datatypes
  2. An introduction to XML Schema 1.1, Co-occurence constraints using XPath 2.0

Mukul Gandhi

  • No labels