This document explains the design of the new Axis Object Model (referred to as AXIOM or just OM for convenience).The following criteria has been selected as the requirements of the AXIOM.
1 Lower memory footprint and light weight object structure.
- Memory footprint of the object model has become a significant concern in Axis. The currently used Object structures are memory intensive.
2 DOM support
DOM API is required in implementing the security services. (http://www.xml.apache.org/security ). AXIOM must support DOM.
3 Pull support
- A pull API is exposed by the AXIOM. This is due to the recent success of Pull based parsing. One of the fastest and the most efficient approach for XML processing is pull based parsing and the success of such an approach was proved by projects like Axismora.
The rest of this document explains the proposed design approach for the AXIOM.
High level components in AXIOM
The Following figure shows highest level abstract view of the AXIOM
AXIOM exposes a DOM API and a Pull API to access the underlying OM model. The OM model however is completely transparent (and irrelevant) to user. The design depends on this concept and initially concentrates on an efficient structure for the OM model. In deciding the OM model the following criteria has been considered. #Efficiency in storage #Differed parsing #Support for DOM and PULL APIs. (as required) In order to fulfil all these requirements a table model has been selected as the base of the OM model. Xalan provides an excellent guide to the use of table models in XML representations. (http://xml.apache.org/xalan-j )
AXIOM table model (AXTM)
The easiest way to deal with table data structures at the conceptual level is to model it as an entity relationship (ER). The following diagram shows the entity relationship of the AXTM. Note that this diagram only highlights the basic entities and is not complete design that includes the full set of attributes associated with each entity. Such a detailed ER diagram is included in the detailed design section.
The advantages of this table model are numerous.
- The data contained within the tables can be normalised to minimize redundancy and hence reduce the memory requirement.
The DOM and the Pull APIs can refer to the data items in the tables without creating an extra memory representation of their own. This reduces the memory requirement further.
- Searching is faster. For example searching the table for a particular set of elements is faster than the traversal of the DOM tree data structure
The next section describes this table structure in detail and the approach taken to efficiently provide the DOM and Pull APIs using the table model.
Detailed AXIOM table model
- lets see a example
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <soap:Body> <entry name=Deepal xmlns = example.com> <telno>77777777</telno> <address>LSF</address> </entry > <entry name=Ajith> <telno>6666666</telno> <address>LSF</address> </entry > </soap:Body> </soap:Envelope>
The following diagram shows the detailed entity relationship
Note the following features about this ER
- Event refers to the relevant events fired by the pull parser. The necessary events such as the START_ELEMENT and START_TEXT will be recorded in the Event entity along with references to the relevant Element and Text entities.
- Element entity, Attribute entity and the Text entity contain references to their respective parents and also a reference to their next sibling.
The text entity also carries comments, CDATA and other textual information items as well; since these contain similar information. Hence the text entity has a type attribute that stores the type of the text contained.
The table model is built according to this ER
The framework for table and rows
Table is a very important structure in this design. Hence the design of the table framework carries special importance. Tables are built with rows. A row object will contain a number of data items and the table will be simply a container for the row objects. The internal implementation of the table may (or may not) use collection classes but the table should effectively encapsulate the functionality. The following class diagrams depict the structure of the table framework. Note that the class diagram included does not contain all the attributes and operations in order to make the diagrams comprehensive.
The use of the table however may depend on the type of row being stored. The default mechanism of searching (by primary key) is incorporated into the base table class but when a specific search mechanism is required the table class may be sub-classed to implement the row specific search mechanism.
The DOM interface
The implementation of the DOM interface is more or less straightforward. Since one row of one table refers to a specific XML info set item (such as an Element or a XML attribute) the DOM interfaces are implemented on the relevant row classes. The following class diagram depicts how the DOM interfaces are implemented. Note that for brevity the full set of DOM interfaces are not shown.
The internal implementation of the DOM API classes is such that they do not create new objects but refer to the ones that are already in the memory. For instance when the text nodes of a particular element are required the Element will do a search internally and produce them. In most cases there is no set attributes to be provided except for the ones that are already stored as a row. (examples pls)
Since some of the rows will correspond to several DOM objects (say the text row will implement the Text, Comment and CDATASection interfaces) the programmer cannot use the instanceof operator to determine the type of a given Node accurately. The recommended method would be the getType() method to test the type of a Node.
Certain other data structures that are part of the DOM API (Such as the NamedNodeMap class which is used to return the set of attributes of an element) are implemented in an iterative manner. This approach makes the iterative access efficient but if the user is to call methods such as getLength() the underlying logic will have to complete a full cycle to determine that information. This deteriorates the edge in performance and memory footprint gained by pull parsing and it is recommended that such methods be avoided as much as possible. The next section describes the collection class behavior further.
The build mechanism
The build mechanism is also important since an incremental builder is used in the construction. The behavior and the scope of the builder in this case are different. Since the document is not built at once the builder is required to be present all the time. StreamingOMBuilder which is the builder in this case, has a reference to the pull parser that contains the document stream. It has a single public method (apart from its constructor) to process one event of the pull parser. The builders only responsibility is to insert and update data items. The different types of nodes themselves contain the logic required for them to grab the necessary data from the model and call the builder to proceed as necessary. To encapsulate the underlying storage mechanism (tables in this case) an OMTableModel class is provided. Its task is to provide the necessary services to the other components such as the builder to access and modify information without revealing what type of data structure is used underneath. The following class diagram depicts how the OMTableModel and the builder are arranged.
Note that since the XML nodes need to access the builder, each one of them should have a reference to the builder inside. Effectively this means that an instance of a builder should be passed on to certain DOM objects (such as the Element) during construction.
Other mechanisms to improve performance
Object pooling mechanism for rows and tables
An object pooling mechanism can be used to improve the performance further. Pools of row and table objects are created and kept to be used in servicing the requests. Once a pooled object is used, they should be resetted and pushed back to the pool. The gain of this approach however will only be visible in a heavily loaded situation.
OM Prototypes Implementation Progress
So far we have implemented four prototypes of OM. Now I think its the time to look back and review our designs, with the experience we got from those implementations.
- Dasarath_Chinthaka_Ajith_Deepal implementation : This was the initial model and this implements table model as rows of a table. More details about this design can be found here.
- Dasaraths implementation : This model introduces two major differences in to the design . 1. Doubly linked lists for siblings 2. JDOM like api and removing tight DOM integration from the OM Model.
- Ajith implementation : Ajith implements the OM model with the column like table model found in Xalan DTM onto a JDOM like API.
- Chinthakas Implementation : This extends Dasaraths implementation by improving children traversal model.
So now we have four OM implementations.
Next ..... ?
We have four implementations of OM model and they were made efficient in different contexts. Following things are to be noted for the future work of OM model.
- decoupling of DOM API from OM Model
I think one of the things we should avoid from our designs is the tight DOM integration we had. The major reason we did that was to support security on top of DOM. However one of the mistakes was to have only the DOM api which is more cumbersome to work with. IMO, when we use security, the performance will be anyway slow. So why bother integrate DOM thingy in to OM itself ? We (I think Glen also) like OM to have a JDOM like api which is easier to use. So this is one of the things Dasarath has done is his implementation. So what is good is OM to have a programmer friendly API like JDOM and wrap OM, for it to support DOM API. In this wrapping one can point to OM model, from the DOM wrapper, to reduce the memory footprint and to improve the performance.
- Since we have four implementations of OM, now we can finalize on OM API. One can select one of the four implementations (or implement one of his own), depending on the context he uses OM.
The proposed API for OM is available in chinthaka_ajith scratch area, as a suggestion.