WSDL 2.0 support for Axis2
Interested in knowing how I spend my time working on my Google Summer of Code project?
Well, here I present some snapshots of my journey. Hope you will enjoy reading about the road map, milestones, scenery, pot-holes and road blocks I encounter!
The short-term solution
Studied the Axis2 source in order to understand the changes that need to be made for WSDL2.0 support. Also, I did a bit of experimentation by plugging in the Woden API. However, Woden does not parse WSDL1.1, and WSDL4J (which Axis2 uses now) does not parse WSDL2.0! Therefore it's not just a matter of unplugging WSDL4J and plugging in Woden.
My mentor Deepal Jayasingha, helped me identify the 2 classes WSDL11ToAxisServiceBuilder and WSDL20ToAxisServiceBuilder. Apparently these classes need to be used to populate the AxisService from the respective WSDL elements. The possible solutions suggested were:
Parse the WSDL1.1 documents with WSDL4J and WSDL2.0 documents with Woden. PRO: None of the rest of Axis2 code needs to be changed. CON: Validating and recognizing WSDLs are not fully implemented yet in Woden, WSDL4J and in Axis2.
Join the Woden team and provide WSDL1.1 capability there. PRO: Would provide the exact programming model needed. CON: Requires lot of effort from Woden.
Switch over to Woden without the WSDL1.1 parsing functionality. There is a converter util to convert 1.1s to 2.0s, therefore with the help of this tool the WSDL4J can be completely eliminated. PRO: Minimum amount of work required!. CON: Loss of information during the conversion may adversely affect the quality of the AxisService.
I hit a road block as there was no direct way of mapping the messages required in the AxisService from WSDL2.0 and the MEPs defined in Axis2 and Woden were totally different. So, further study and experience with the WSDL2.0 specification was necessary.
Reading a WSDL from the archive
I explored how a WSDL is read into the Axis2 engine through the ArchiveReader and the DeploymentEngine. Wrote a little utility class called WSDLVersionDeterminer which will determine the version of the WSDL. This is independent of Woden or WSDL4J APIs and was intended to be included into the Axis2 source. The version is to be determined by checking the name of the first element.
The patch is attached at http://issues.apache.org/jira/browse/AXIS2-808.
readWSDL() in particular...
The Woden API only facilitates one readWSDL method which only accepts the URI of the WSDL document. Since service deployment in Axis2 is done as an AAR (Axis ARchive) file, this method signature is of no use. Since other types of overloaded readWSDL methods are not currently provided in the WSDLReader interface, the only solution which came to my mind was to extract the WSDL from the archive and write it as a file to a particular location on the local file system, and then give that location as the argument to the readWSDL. Obviously this is NOT the way to go!
Therefore, taking the WSDL4J API as an example, I implemented a method which will take a DOM document element as an additional argument to the readWSDL method and return the DescriptionElement.
The patch is attached at http://issues.apache.org/jira/browse/WODEN-27.
The solution I had in mind for this problem is this:
WSDLReader reader = WSDLFactory.newInstance().newWSDLReader();
//in is an InputStream, in Axis2's case it could be the zipped inputstream from the aar Document doc = org.apache.axis2.util.XMLUtils.newDocument(in);
Element element = doc.getDocumentElement();
DescriptionElement desc = reader.readWSDL(null,element); //Note: the WSDL Uri could be null
The developers from Woden who examined the patch agreed that the readWSDL method's signature should be extended. The existing readWSDL(URIstring) method has been only the starting point to allow them to focus on developing the object model and parsing / validation logic. However there were few concerns about adding DOM dependencies directly to the WSDLReader API via methods like readWSDL(uri, Element) or readWSDL(uri, Document). So they emphasized on avoid introducing to the Woden API, any dependencies on a particular XML parser or XML object model if possible, or at least to capture any required dependencies in some way that lets the API still reflect a choice of XML parsing strategy.
You can find their suggested solution here.
Some findings about validation, etc
So far from the work I have done one major bottleneck was the pull-parsing model (hmm.. guess that's what makes StAX so special) and the loss of information about the other elements as the cursor moves through the document. With DOM, parsing the WSDL is very easy and could be done without much of a problem since the object model is taken into the memory.
StAX also poses huge problems when it comes to validating. Some of the validation assertions require information from other elements. So, if I create an interface and validate it, the assertion may need access to the rest of the interfaces in the model. This may cause more of the model to be loaded than expected. There may be ways to work around this by refactoring the validation model. Definitely it is clear that the validator won't necessarily be able to play nice for all assertions. I am starting to think that validation may not make sense for a pull parsing model!
I also learnt that the validators are broken down in a way that they can be invoked on a certain element, but they currently assume that the rest of the model is present. With StAX, we may want to create additional validation options that disable the assertions that perform deep checks. This should perform better for pull parsing with the cost of not having complete validation.
In the current implementation there are logically 4 phases to a WSDL 2.0 validation.
- XML well formedness check
- Schema validation
- WSDL 2.0 document validation
- WSDL 2.0 component validation
1 and 2 are performed by the DOM parser. A StAX parser will need to perform these as well. 3 and 4 are run after the Woden model is created, and only rely on the Woden and XmlSchema models. The Schema validation depends on the parser. For DOM Woden uses Xerces, which contains a schema validator. They have used this validator to avoid having to rewrite all of the rules that are defined by schema. But, as far as I can see, I would have to write up my own schema validator for the StAX implementation, or else put up an ugly hack to make use of the current validator.
So, for the moment I will leave validation aside and concentrate on the prototype which will provide basic functionality to read the hotel-reservation.wsdl (the one given with the WSDL2.0 spec) into a StAX model.
Implementing a pure StAX parser was problematic
I initially started parsing a WSDL purely with a StAX XMLStreamReader to build the Woden element model. The idea I had, was to cache the XMLStreamReader at each and every top level element every time as they are accessed. I wanted to use this cached parser in cases where the later elements needed information from previously accessed elements.
However, I realized that when there are so many nested elements, this approach created many parser instances even when it was not required (i.e. when those elements could have been accessed with the current parser). And this was a major problem when it came to the schema validation.
The obvious solution - Use Axiom!
Since AXIOM is based on StAX, the resulting implementation would be fast and efficient, as it is expected from a StAX parser. If one of the objectives of Woden is to be used in Axis2, I suppose using AXIOM in Woden would not be much of a problem :).
I implemented a prototype OMWSDLReader as an alternative to the DOMWSDLReader. In the case of schema, I used XMLSchema as in the current DOM impl. However, the arguments to the XMLSchemaCollection's read method posed a problem, and I could only come up with the following work-around.
//omElement is an OMElement which contains the <xs:schema> element String elementString = omElement.toStringWithConsume(); byte bytes = elementString.getBytes();
//Deserialize from the byte array InputStream inputStream = new ByteArrayInputStream(bytes); InputSource inputSource = new InputSource(inputStream);
XmlSchemaCollection xsc = new XmlSchemaCollection(); XmlSchema schemaDef = xsc.read(inputSource, null);
This returned the correct XMLSchema as it was there in the WSDL. However, unlike as in the DOM impl, apart from the targetNamespace, the other namespaces were not there as attributes to <xs:schema>. I wonder whether this could lead to a bug later in the model for schema in Woden!
There were several discussions on the Woden-dev list about how Axiom should be plugged into the Woden interface. The possible solutions were:
- To implement Woden object model extending Axiom elements.
- Build the Axiom object model from the parser and to use that to populate the Woden model.
First approach is preferred as it won't create two object models. But this requires some one to re-implement Woden object model. So the best short term option is to head for the second option.
The plan I had and More problems
I tried the 2nd option, where the the AXIOM object model is built from the WSDL and the Woden interface implemented from it. My idea is to have an OMWSDLReader which does not have any DOM dependencies and uses AXIOM to get whatever the elements in the WSDL and parse them into the Woden specific objects. This approach is quite easy, and the current implementation in Woden seem to support it.
However, when I was trying to handle the extension elements and attributes, I came across several classes such as ExtensionDeserializer, XMLAttrImpl (and in fact a whole bunch of xxxAttrImpl classes) which seem to be heavily dependent on DOM.
This was a blocker for me, and I wanted to know whether it's possible to work around these classes, or whether it would be possible for the init and convert methods to take in OMElements? I really couldn't grok the logic behind handling the extension attributes and elements in Woden. So, I was also seeking any insight on that as well.
The most significant part of large WSDLs will be the schema part. Since Woden relies on ws-commons XMLSchema for schema parsing and that relies on DOM this could be a bottleneck even if the WSDL is read using AXIOM. So to boost overall performance there was a need to look at XMLSchema using AXIOM/StAX instead of DOM.
I opened up a JIRA at [ http://issues.apache.org/jira/browse/WODEN-37 ], and attached a patch that will provide some initial StAX based parsing through the OMWSDLReader and several util classes. This follows the same structure as in the DOM model. However, several parsing methods are yet to be added awaiting the abstraction of the Woden object model.
Woden telecon outcome
I informed about the patch I've sent them and inquired whether anybody has had a chance to review it. Since they were all getting ready for the interop, nobody haven't had the time to do so.
I also clarified several problems I had, especially about downloading all the schema for schema in the current model. They assured that it is done to import the implicitly available XML schema simple types. This will be removed for a more elegant and performant solution.
They also asked me about the need to remove all the DOM dependencies. The point they have is the issue that some WSDL elements can contain arbitrary XML. And Woden doesn't provide a solution for this yet. They asked me these questions, and I need to find the solutions in order to convince that Axiom is better than DOM!
- What is so bad about having a DOM element?
- Can it represent arbitrary XML?
- Is it possible when using StAX to parse the content into DOM elements? (Since DOM is just an interface.)
- What are my ideas of how to represent mixed type elements in Woden.
And the meeting ended with the conclusion that they may want to preserve arbitrary unknown extensions as well and to use the DOM API instead of reinventing the same functionality.
Before applying my patch to the Woden SVN, they have run it against the AllWodenTests testsuite, and have got 46 junit failures which mostly seem to indicate that a null DescriptionElement has been returned!
This is expected as I have not implemented the readWSDL(uri, errorhandler) method, and all the tests are based on that.
So, now that the patch is committed, I guess I would have to fine tune the code and make sure all the test cases are handled. :).