What does XMLBeans bring to the table?

It's been asked, isn't this all old hat? Don't we have XML validators, XML<->Java binding technologies, etc?

XMLBeans does several things...

(1) It's not just a validator. It's a faithful, programmable implementation of the entire XML schema type system in Java. It provides all the things the schema spec requires. For example, if your schema has a complex type with simple content that extends a restriction of a list of xsd:ints, no sweat. You can get instances of that type that behave exactly as you'd want.

This can be separated from the schema->java compiler. Even without compiling a schema, you can load and use runtime instances with strong schema support built right into them.

For example:

The fact that it happens to be able to validate very accurately is a given. What was suprising to us is that we've found it to be a faster validator than other implementations even though the schema support code is designed with broader scope.

(2) The same characterization applies to comparisons with JAXB, Castor, etc. It's not just a lossy Java<->XML translation tool. It is a full implementation of the XML infoset plus XML post-schema-validation infoset in Java. Some details:

(3) It brings a useful new approach for dealing with XML instances. DOM, JDOM, and XOM all take take the approach of providing an object for each node in the infoset. We've found that doing this upfront is expensive. So XMLBeans stores instances by allocating a much smaller number of objects.

It has an architecture where needed objects can be allocated on-demand and attached to the store. The strongly-typed (bound to Java class) objects are allocated this way. We've found this to be not only fast, but quite flexible. It gives us an architecture where we can easily provide different "views" on the same data and keep them in sync all the time.

But we've also found that the performance gap to DOM is large enough that you really want a cursor-oriented way of exploring the XML infoset. So XMLBeans also proposes a novel solution here in its XMLCursor API. By avoiding object allocation, you can process raw instances in-memory in a random-access way faster than you could with a DOM.

(4) It does other interesting things too, for example, it can literally make the XML Schema type system part of the java type system, and you can manage it the same way. You can compile schema types and put them on your classpath. Just as you have ClassLoaders in Java, you can have SchemaTypeLoaders. Just like you can use two classes together by putting them on the classpath together, you can use two schemas together by putting them on the classpath together.

That's a whole interesting area of its own.

At any rate, in summary: I think one of the reasons there is so much duplication in this area is that the area is so important to the community at large.

Data binding is all about bridging the gap between different abstractions, and it's a major pain point for application developers - nobody wants to worry about it, but everybody ends up spending a lot of time on it, whether it's java<->xml or java<->sql or whatever.

The right hub for solving this problem is really (imho) XML for instances and XML Schema for metadata, so I think, by taking that XML+Schema problem on in a direct and complete way, XMLBeans provides some technology that will help us improve and change the developer tool landscape.

Some thoughts on future directions for XMLBeans are collected together at XmlBeansRoadMap.

David Bau

david.bau (at) bea.com

XmlBeansExplanation (last edited 2009-09-20 23:32:42 by localhost)