What does XMLBeans bring to the table?
It's been asked, isn't this all old hat? Don't we have XML validators, XML<->Java binding technologies, etc?
XMLBeans does several things...
(1) It's not just a validator. It's a faithful, programmable implementation of the entire XML schema type system in Java. It provides all the things the schema spec requires. For example, if your schema has a complex type with simple content that extends a restriction of a list of xsd:ints, no sweat. You can get instances of that type that behave exactly as you'd want.
This can be separated from the schema->java compiler. Even without compiling a schema, you can load and use runtime instances with strong schema support built right into them.
For example:
- It doesn't just use schema to help you read instances. It helps you build schema-compliant instances (both simple and complex content models too).
- It doesn't just validate dates, it can help you do schema-compliant date math and comparisons.
- It doesn't just validate when you parse. It can validate subtrees of instances that you are manipulating. This is tremendously useful when debugging.
The fact that it happens to be able to validate very accurately is a given. What was suprising to us is that we've found it to be a faster validator than other implementations even though the schema support code is designed with broader scope.
(2) The same characterization applies to comparisons with JAXB, Castor, etc. It's not just a lossy Java<->XML translation tool. It is a full implementation of the XML infoset plus XML post-schema-validation infoset in Java. Some details:
- It is fully faithful to the XML Schema spec, so, for example, "instanceof" works as it should 100% of the time.
- It's also fully faithful to the way XML Schema works in instances, so, for example, it does type substitution and does element substitution etc, in a very clean way. For a number of detailed reasons, these things are not as easy or robust in JAXB.
- It is fully faithful to the XML infoset, so, for example, it doesn't just go and drop comments or whitespace because they don't happen to bind to java primitive types. It keeps and synchronizes the entire XML infoset, so at any time you can say, "did this element have an extensibility attribute on it?" "Did it have a comment?" "Let me change that comment." You have to use it to really know it, but it's really a beautiful thing to be able to drop from saying int i = xobj.getQuantity(); immediately to manipulating comment nodes on xobj. This is also extremely important for situations when you're dealing with wildcards and so on.
- It's robust enough that it is a self-hosted XML compiler. In particular, we don't have a hand-built schema parser. We use the automatically compiled XMLBean, compiled from schema-for-schema, to parse schema. I'm not aware of other binding tools that have enough depth of schema support to do that.
(3) It brings a useful new approach for dealing with XML instances. DOM, JDOM, and XOM all take take the approach of providing an object for each node in the infoset. We've found that doing this upfront is expensive. So XMLBeans stores instances by allocating a much smaller number of objects.
It has an architecture where needed objects can be allocated on-demand and attached to the store. The strongly-typed (bound to Java class) objects are allocated this way. We've found this to be not only fast, but quite flexible. It gives us an architecture where we can easily provide different "views" on the same data and keep them in sync all the time.
But we've also found that the performance gap to DOM is large enough that you really want a cursor-oriented way of exploring the XML infoset. So XMLBeans also proposes a novel solution here in its XMLCursor API. By avoiding object allocation, you can process raw instances in-memory in a random-access way faster than you could with a DOM.
(4) It does other interesting things too, for example, it can literally make the XML Schema type system part of the java type system, and you can manage it the same way. You can compile schema types and put them on your classpath. Just as you have ClassLoaders in Java, you can have SchemaTypeLoaders. Just like you can use two classes together by putting them on the classpath together, you can use two schemas together by putting them on the classpath together.
That's a whole interesting area of its own.
At any rate, in summary: I think one of the reasons there is so much duplication in this area is that the area is so important to the community at large.
Data binding is all about bridging the gap between different abstractions, and it's a major pain point for application developers - nobody wants to worry about it, but everybody ends up spending a lot of time on it, whether it's java<->xml or java<->sql or whatever.
The right hub for solving this problem is really (imho) XML for instances and XML Schema for metadata, so I think, by taking that XML+Schema problem on in a direct and complete way, XMLBeans provides some technology that will help us improve and change the developer tool landscape.
Some thoughts on future directions for XMLBeans are collected together at XmlBeansRoadMap.
David Bau
david.bau (at) bea.com