Gsoc xs override proposal

Project	Implementing XML Schema 1.1 overriding component definitions (<xs:override>)
Student Name	Udayanga Wickrmasinghe
Email	mastershield2007@gmail.com
TimeZone	GMT +5.30

Abstract

Apache Xerces2-J consists of a set of  standards compliant  XML parsers and a [1] XML Schema processor which are built on top of a complete framework(XNI) for building parser components and configurations that is extremely modular. Although Xerces XML Schema processor supports more than the minimal requirement under W3C XML Schema 1.1 [2] specification some vital schema component support is still to be realized. This project tries to implement one such requirement, namely  xs:override support for XML Schema1.1 .

Description

XML schema specification version 1.1 specifies syntax and semantics of “Overriding component definitions<override>”[3] or xs:override under “Schemas and Namespaces: Access and Composition” . The new component definition is a powerful addition to the XML schema composition framework which tries to mitigate some of the constraints present in similar constructs such as xs:redefine or <redefine> schema components. The <redefine> construct defined in “Including modified component definitions (<redefine>) “[4] on XMLSchema specification is useful in schema evolution and versioning. It can be used only when there exists some restriction or extension relation between the old component and the new redefined component. But there are occasions when the schema author simply wants to replace old components with new ones without any constraint. Also, existing XSD processors have implemented conflicting and non-interoperable interpretations of <redefine>. And to add to the trouble <redefine> construct is declared •deprecated• in XML Schema 1.1 [2]. Hence as mentioned in the beginning , <override> construct tries to avoid these bottlenecks and allow unconstrained replacement as and when needed.
According to the XMLSchema 1.1 specification xs:override schema component is specified in the following form (override information item) ,

<override
      id = ID
      schemaLocation = anyURI
      {any attributes with non-schema namespace . . .}
>
      Content: (annotation | (simpleType | complexType | group | attributeGroup | element | attribute | notation))*
</override>

.Here “schemaLocation” indicates the location of the overriding schema document while “Content” corresponds to the types/groups/attributes/elements this schema will be overriding on the schemas available at corresponding “schemaLocation” . xs:override semantics are very much similar to class/prototypical inheritance where after successful application , corresponding overridden schemas replaces their old schemas by the new overriding schemas contained within a <xs:override> element. Following describes the criteria of xs:override on a general overview that should be considered in the implementation of xml schema composition on xs:override .

.

1.override only applies if the schema component within <xs:override> exists in the overridden schema (corresponding to the respective schema Location defined). If this condition is not true, there’s no effect  on the overriden schema  location and overriding grammer won’t exist in the final schema representation.

2.
a)Each and every <override>  schema information element would be subjected to  “override transformation”[5] . However when target namespaces of overriding and overridden schemas don’t match “chameleon inclusion transformation”[6] is also performed prior to the override transform.  Override transformation itself is pervasive and therefore would be applied to <include> information items on the overridden schema.(ie:- if schema A overrides B and B includes C then C will also be overridden accordingly ).Further more override transformation applies to <override> information items present on the overridden schema by merging. .  .

. Although the idea behind xs:override seems to be rather simple , several scenarios need to be considered where some complications would inevitably arise.Following are several such considerations.

Circular includes and overrides would end up creating duplicate components that should be flagged as errors . (ie:- if Schema A include B , B override C and C intern include B then we could endup with different versions of B included in both A and C . Versions of B included in A and C will only be considered same if C->B override transformation does not apply  ) .  .

This scenario could occur in circular overrides as well  (ie:- if schema A overrides B and B overrides A , then if B->A override transformation affects schema A then duplicate errors will occur) .  .

In general if overriding schemas produce same override transformation schema result for inclusion, then they would be considered the same other wise duplicates will occur .(ie:- If schema A includes B,C and B,C both inturn override D  ,then B,C would include same version of D if B->D  and C->D override transform doesn’t affect D  OR if B,C both got same override schema components )

Implementation of xs:override should take into account the aforementioned factors so that dependencies are correctly evaluated and necessary schema preprocessing is performed.
Xerces2-j XMLSchema processing and supporting structures are mainly handled by classes located in org.apache.xerces.impl.xs package. Primarily XMLSchemaFactory and XMLScehmaLoader are responsible for loading(ie:- #loadGrammar() ) and handling schemas from a set of sources. Actually XMLSchemaLoader acts as an wrapper that provides necessary inputs for a XSDHandler (org.apache.xerces.impl.xs.traversers) instance which will inturn be coordinating the construction of a grammar object corresponding to a schema throughout several stages. XSDHandler instance is responsible for parsing each schema source (including the ones’s that are imported resulting in other grammers) , preprocessing/resoving /loading grammers etc.
XSDHandler#parseSchema() is responsible for coordinating these critical stages of schema composition which includes ,
a) constructTrees – constructs XSschema objects .Attempts to resolve <include>,<redefine> schema components and builds a dependency map .
b) buildGlobalNameRegistries - builds registries for all globally-referenceable names. Keeps track of <redefine> component mapping.
c) traverseSchemas – traverse globally declared elements with appropriate traverser object (ie:-SimpleType/ComplexType/Attribute traversers) and handle them accordingly.
d) traverseLocalElements-Traverse all the deferred local elements
e) resolving ID/Key references
f) storing imported grammars and building the Grammer Pool
.

xs:override implementation intends to extend this functionality to implement <override> structure semantics. Also several supporting structures will be needed so that <override> schema components will be identified during schema processing. For example extending of following class structures.

• org.apache.xerces.impl.xs.SchemaSymbols – keeps track of collection of symbols used in parsing Schema Grammer. Need to introduce new <override> grammer symbols to this.
• org.apache.xerces.impl.xs.XSDDescription - keeps track of all information specific to XML Schema grammars. This can be used to indicate the Schema processor that the current schema document is overridden by another scheama document.
Additionally Xs:override implementation may require many new components/data structures to be added to org.apache.xerces.impl.xs package inorder to handle different scenarios regarding <override> semantics as was described in the beginning.

Things I have Done So far

Since this project is about implementing a XMLSchema 1.1 specification construct , I had to go through this specification docs several times to understand the exact structure and semantics of the component I’m going to implement which I think is of vital importance when it comes to the design n implementation. Previous discussions (that has happened in Xerces-j-dev mail archives) about this xs:override support , online articles and tutorials also helped a lot in this cause. I also interacted with Xerces mailing list (especially with my mentor) to clarify critical points and implementation details. Since knowing Xerces and it’s internal framework(XNI) is obviously essential for the implementation I had to dig into various documentation, API information ,samples regarding Xerces Design,architecture and especially XML schema processing. I did download source code of Xerces2-j from trunk and build the code inorder to try and test out some samples to get a hang on the flow of things related to schema loading and processing.

Development Schedule

Time Schedule/Duration	Activity
March 18 - March 29	Initiate ideas ,discuss project details , get feed back on different aspects of the project,etc
March 29 - April 9	preparing project proposals and submission
April 26	GSoc Accepted student proposals announced by Google
April 26 - May 24 (Community Bonding Period)	preparation on design aspects,architecture and deployment on xs:overide implementaion preraration on various platform details (ie:-xerces architecture,scheam processing,etc) prepare development environment
May 24 - July 12	Start coding on xs:override implementation
July 12- July 16	Mid term Evaluations - students and mentors submit evaluations
July 16 - August 9	start second phase of coding write tests for xs:override validation
August 9 - August 16	Final week of the project - final code submission on August 16th refine/review code finalizing documentaion
August 23	Final results of GSoc 2010 will be anounced

Deliverables

Source/patch related to xs:override implementation
Solid set of test cases to verify related aspects of xs:override schema composition
Documentation (java docs + design details) on xs:override implementation/API

Community Interaction

Initially i had trouble selecting a project since the project i was keeping in mind was already undertaken. Xerces-2j mailing list was really helpfull in this ,giving me lot of feedback on available projects that was not even initally declared as Summer projects for 2010 . I was later able to get lot of insight on xs:override specification semantics and implementation details through the interaction on mails from xerces-j dev community . Through this I managed to digg into and clarify lot of details that would be very helpful through out my project and has definitely been a much needed guidance in writing this proposal as well.

About me

I'm a Computer Science Engineering undergraduate (final year),of the department of Computer Science and Engineering, University of Moratuwa, Sri Lanka . I'm very much passionate about Computer science and am especially interested in subject areas related to Compiler Theory ,Distributed Systems and Enterprise Middelware and also Artificial Intelligence.I do have experience in open source development and related aspects and always loved working in such a dynamic and encouraging environment .
I have worked on projects related to Apache Axis2 where I developed a tool [7](incubating) to extract WS-Policy(ie:-Security Policy) from WS policy compliant SOAP messages. Tool is especially useful in scenarios such as for clients who wish to build compatible client side policy for repective Services which don’t expose their messaging policy explicitly. This gave me a great understanding on WS:Security specifications and on mechanisms used by security modules such as Apache Rampart as well. Furthermore i developed Axis2 Messaging and Service Level Infrastructure for Rubyscript [8], so that Ruby Scripts can be exposed and accessed via [WebServices] by clients .
I also do have experience involved in projects related to Eclipse plugins, OSGi , XML parsers(this for our internal module DSD2.0[9] parser) and Data Mining (ie:-Collaborative Filtering), which got me working in a wide variety of frameworks ,programming/scripting languages such as Java ,C,C++, Javascript,Ruby,etc and under various platforms of Linux and Windows. I am currently involved in implementing a [TupleSpace] based Distributed System framework (which runs on top of a DHT[distributed hash table] named [FreePastry] [opensource implementation of Microsoft Pastry] ) for our final year project ,which facilitates time and space decoupling as well as content based addressing for messages in a distributed environment [10]. I consider my self a motivated computer science enthusiast who is willing to self learn and accept challenges and achieve them to the best of my ability.
.