Overview

XML Schema 1.1 [1][2] specifies a new version of the W3C XML Schema definition language which offers facilities for describing the structure and constraining the contents of XML documents. Among the many changes being introduced in XML Schema 1.1 are improvements to support better versioning of XML languages defined using XML Schemas and a facility for specifying co-occurrence constraints (see sections 3.12 [3] and 3.13 [4] of Structures) using XPath 2.0 expressions. Schema 1.1 also introduces four new built-in datatypes: a new decimal type (precisionDecimal) which retains information about the precision of the value and three others (anyAtomicType, dayTimeDuration and yearMonthDuration) to align with the type system defined for XSLT 2.0 and XQuery 1.0. A detailed list of changes since XML Schema 1.0 is available here [5] and here [6].

Design Thoughts

Xerces-J 2.10.0 should provide experimental support for the features defined in the XML Schema 1.1 specification. The design of the new XML Schema 1.1 componentry should minimize the performance impact on the existing XML Schema 1.0 implementation.

External APIs

Previous versions of Xerces-J have provided several methods for performing schema validation. An application could use one or a combination of the validation API (javax.xml.validation) introduced in JAXP 1.3, the JAXP 1.2 schema source and schema location properties, the DOM Level 3 schema-type and schema-location parameters, Xerces-J specific features and properties, grammar pools and various implementation classes like XMLGrammarPreparser.

While one could define what XML Schema 1.1 processing means with respect to all of the above, it may be beneficial to constrain the support to JAXP Validation API alone (at least initially). If the XML Schema 1.1 validation support is only exposed through JAXP it would enforce good practices (i.e. grammar caching and coding to standard supported APIs) and ensure that the semantics of XML Schema 1.1 are applied consistently from schema loading through instance validation. If there is justified demand for other methods for enabling XML Schema 1.1 validation the support could always be added later.

Changes to support XML Schema 1.1 in JAXP

The entry point to the JAXP validation API is the [SchemaFactory]. By calling one of the newInstance() methods an application obtains a concrete implementation of the factory for a given schema language. Schema languages are identified by a URI. The XML Schema 1.1 specification defines several URIs [7] for identifying the XML Schema language and its various versions and editions. In addition to supporting http://www.w3.org/2001/XMLSchema which JAXP 1.3 defines to be an identifier for XML Schema 1.0, Xerces-J will also recognize http://www.w3.org/XML/XMLSchema/v1.0 as an identifier for XML Schema 1.0 and http://www.w3.org/XML/XMLSchema/v1.1 as an identifier for XML Schema 1.1.

If the schema language identifier passed to SchemaFactory.newInstance() is equal to http://www.w3.org/2001/XMLSchema or http://www.w3.org/XML/XMLSchema/v1.0 it will return a SchemaFactory which supports compilation of XML Schema 1.0 schemas into a Schema object which can then be used to validate instance documents according to the rules of XML Schema 1.0.

If the SchemaFactory provides support for the XML Schema 1.0 language, isSchemaLanguageSupported() will return true if the string passed to it is equal to http://www.w3.org/2001/XMLSchema or http://www.w3.org/XML/XMLSchema/v1.0 otherwise false.

If the schema language identifier passed to SchemaFactory.newInstance() is equal to http://www.w3.org/XML/XMLSchema/v1.1 it will return a SchemaFactory which supports compilation of XML Schema 1.1 schemas into a Schema object which can then be used to validate instance documents according to the rules of XML Schema 1.1.

If the SchemaFactory provides support for the XML Schema 1.1 language, isSchemaLanguageSupported() will return true if the string passed to it is equal to http://www.w3.org/XML/XMLSchema/v1.1 otherwise false.

If no concrete implementation of SchemaFactory is found by the first three steps in the discovery process, newInstance() should return a platform default implementation. All implementations of JAXP 1.3 (and above) must provide a default SchemaFactory for W3C XML Schema 1.0 (i.e. http://www.w3.org/2001/XMLSchema). Default implementations for the other XML Schema language identifiers described above will also be provided by Xerces-J.

JAXP 1.4 introduced an additional newInstance() method which allows an application to select a specific SchemaFactory implementation by schema language and class name. The table below lists all of the legal combinations for the implementations which will be provided by Xerces-J.

Schema Version

Schema Language Identifier

SchemaFactory Implementation Class

1.0

http://www.w3.org/2001/XMLSchema

org.apache.xerces.jaxp.validation.XMLSchemaFactory

1.0

http://www.w3.org/XML/XMLSchema/v1.0

org.apache.xerces.jaxp.validation.XMLSchemaFactory

1.1

http://www.w3.org/XML/XMLSchema/v1.1

org.apache.xerces.jaxp.validation.XMLSchema11Factory

Changes to the XML Schema API to support XML Schema 1.1

In order to support the schema component model defined by the XML Schema 1.1 specification, the XML Schema API (i.e. org.apache.xerces.xs.*) will need to be updated. Additions and changes to the API are TBD.

Parser Features and Properties

Property:

http://apache.org/xml/properties/validation/schema/datatype-xml-version

Value Type:

a java.lang.String object

Default Value:

null

Access:

read-write

Description:

This property allows the user to select which version of the XML specification is used to define the lexical and value space for datatypes (such as QName, ID and IDREF) which depend on productions from the XML specification. The values currently supported are "1.0", "1.1" and null. If the value of this property is null the definition of the lexical and value space for these types will be determined by the XML version of the document in which they are contained.

Note:

This property also applies to the regular expression grammar used for specifying pattern facets.

Messages and Error Codes

Error codes reported by Xerces-J for XML Schema 1.1 errors should be consistent with those defined in the XML Schema 1.1 specification [8][9][10]. Error messages from the XML Schema 1.0 implementation should be reused where possible. Additional error messages will be required for the new constraints introduced by XML Schema 1.1.

References

[1] http://www.w3.org/TR/2007/WD-xmlschema11-1-20070830/

[2] http://www.w3.org/TR/2006/WD-xmlschema11-2-20060217/

[3] http://www.w3.org/TR/2007/WD-xmlschema11-1-20070830/#cTypeAlternative

[4] http://www.w3.org/TR/2007/WD-xmlschema11-1-20070830/#cAssertions

[5] http://www.w3.org/TR/2007/WD-xmlschema11-1-20070830/#changes

[6] http://www.w3.org/TR/2006/WD-xmlschema11-2-20060217/#changes

[7] http://www.w3.org/TR/2007/WD-xmlschema11-1-20070830/#langids

[8] http://www.w3.org/TR/2007/WD-xmlschema11-1-20070830/#validation_failures

[9] http://www.w3.org/TR/2007/WD-xmlschema11-1-20070830/#outcome-src

[10] http://www.w3.org/TR/2007/WD-xmlschema11-1-20070830/#outcome-cos

  • No labels