XMLBeans v2 Binding Architecture

This is a work in progress and is currently just a very rough sketch. If you add notes below or have other comments and Q's, please be sure to post to xmlbeans-dev@xml.apache.org.

This architecture should be maintained to be in sync with the feature plans at XmlBeansFeaturePlan

User Model

The basic form of binding compilation supported by XmlBeans v2 will still produce a JAR that contains both Java and XML types as well as the information on how they are bound to each other. Let's call that JAR for now, a "tylar" (a "type library"; who knows; or come up with your own cooler term and we'll search-and-replace.)

A tylar contains:

  1. The compiled Java classes
  2. The compiled schema types
  3. The compiled binding information between Java and Schema (this may be both data and code)

This is not that different yet from v1.

However, the XmlBeans v2 binding toolset will support the start-from-Java use cases, e.g., those specified by JSR 101 and 109, the start-from-schema use cases specified by JAXB and XmlBeans v1, and it will also support a simplistic start-from-both-java-and-XML use model. It will also support pluggable forms of binding.

To help reduce the complexity of supporting all these models in a unified way, java/xml binding will be addressed through two separate levels of compilation and two separate styles of runtime binding.

Two Phases of Compilation

There are two phases of compilation:

  1. Type translation
  2. Type binding

The two phases are completely separate from each other. The first phase compiles its results into a "tylar source code" which contains both java classes and xml schema types, as well as a source file that describes the binding between them, but on its own may not support marshalling+unmarshalling the Java and XML.

The second phase produces a "binding type binary" in which the Java and XML can actually be bound and used together. Here the binding source code is processed, and any additional serialization code and/or computed configuration is produced to support the binding runtime.

This is not dissimilar from JSR 109's framework in which type translation is done during development, yet type binding is done during deployment. (Actually, in practice, our type translation may be repeated during 109's deployment phase, but let's work out those details when we get to them....) Let's take a look at the code artifacts manipulated by the compilation.

Original code

Original code. This is the code that you would ordinarily check into your sccs. If you are "starting from Java", then your original code is the Java code, not the .xsd or .wsdl code. If you are "starting from schema", your original code is the .xsd or .wsdl code, not the Java code.

In addition, "original code" includes "user-friendly" configuration files. For example, when using JAXB, original code includes the JAXB configuration annotations in the schema. In the 109 case, configuration may consist of a standard 109 mapping file; in 101, configuration may be supplied using any number of mechanisms e.g., annotations in the code.

You may not be able to build your whole application using just the original code. If you are starting from Java, then any xml-oriented tools will not know what schema types your Java classes correspond to, and if you are starting from schema, then your original code will not include any Java classes or interfaces that you can program against. Before you can even link your code, you need to translate your original code so that it is represented in both XML and Java type systems.

There may be several different forms of 'original code', and there may be several very different ways of compiling original code into 'primary code'. For example, the way to work with and configure JAXB will potentially be very different from the way to work with and configure XMLBeans, even though they are both "start from schema".

Tylar source code

Tylar source. Unlike 'original code', which may come in many different forms, 'tylar source' always comes in the same form. It is compiled (and often packaged as a JAR), and contains the following:

  1. Compiled .class files representing the Java data types that applications are to be using for their xml-bound data. These classes may be interfaces or classes, depending on which binding features are being used.
  2. Compiled .xsb files representing the XML data types that the applications are to be using for their data.
  3. A binding-config.xml file representing the binding between the Java types and the XML types in the file.
  4. (Possibly...) If JSR 109 is applicable, a portable JSR 109 jaxrpc-mapping.xml file in order to comply with JSR 109's 'portable code' requirement. If JSR 109 is not being used, the 109 mapping file can be omitted.

When the JSR 109 mapping file is inside the portable JAR, it is not intended to be modified there, but to improve interoperability with 109 implemetnations other than XMLBeans.

Specifically, there are certain things that are not included in the 'portable code':

  1. Code-generated serialization and deserialization code (e.g., code to call setters and getters) is not included.

  2. Any derived or computed binding information that is inferred from binding-config.xml is NOT there.

In the 'start from both java and xml' use case, the developer assembles their 'tylar source code' themselves. They do this by separately compiling (without translating) their .java and their .xsd files, and by supplying a hand-written binding-config.xml file.

Since binding-config.xml will be pretty explicit and probably quite long, the real-world way to use 'start from both' is to begin by invoking the translator to get XML and Java and binding-config that correspond to each other exactly, and then use that as a starting point for all three artifacts.

Because 'tylar source' must be able to be produced directly by a developer, tylar source must not contain any code that is code-generated from the binding-config.xml file. It may contain code that is validated against the binding-config.xml file; however, it should be reasonable to change the Java, schema, and bind-config.xml file and still expect the binding compiler to generate the rote binding code for you.

Compiled Tylar

Compiled Tylar. This code can actually be run. It includes all the code necessary to operate at runtime, including code-generated serialization and deserialization code if necessary.

The deployed code is allowed to include any amount of code-generated code based on binding-config.xml, and the JAR can also contain precomputed binaries and so on. The binding-config.xml may or may not actually be consulted at runtime, and should not be changed in the compiled tylar.

The central role of binding-config.xml

As you can see, the binding-config.xml file plays a central role in this model:

  1. All forms of binding and binding configuration must be able to be represented as entries in binding-config.xml.
  2. It is the main piece of information (other than the java classes and schema types themselves) that is driving marshalling and unmarshalling.

So this file is discussed as its own topic at XmlBeansV2BindingConfig.

Two forms of runtime binding

There are two kinds of runtime binding:

  1. Faster but lossy binding
  2. Lossless but slower binding

The faster form of binding marshals and unmarshals in a single pass, by calling setters or getters directly on concrete Java objects. In doing this, it potentially loses information about incoming XML streams that cannot be round-tripped when saving back out to XML.

The lossless form of binding requires that the Java objects synchronize their state with an XML infoset tree. Loading and saving operates by loading and saving the XML infoset, and binding works incrementally at runtime when the Java objects query or push their state to the tree. Lossless binding is only supported in situations where there is a one-to-one correspondance of typed Java objects with interior nodes of the XML infoset. For styles of binding that are not one-to-one, lossless binding is difficult enough that we believe it will be out-of-scope for v2.

Some very rough notes on the faster-but-lossy binding problem will be developed at XmlBeansV2FastRuntime. JAXB is a particularly special case that needs to be dissected.

In addition, a specification is needed for the lossless binding solution; it will be developed at XmlBeansV2LosslessRuntime.

The central role of binding-config.xml

Again, both fast and lossless binding are guided by entries in binding-config.xml. In particular, individual binding steps are broken down into binding types are are each individuall configured. A binding type describes how to instantiate and fill in (or decompose) a Java object and translate it from (or to) the corresponding XML, and it may of course delegate to other binding types to handles nested data.

A catalog of these types is described at XmlBeansV2BindingConfig.

User Scenarios

The model above is designed to cover the following scenarios:

  1. start from schema
  2. start from Java
  3. start from both

Start from schema

Alice has some number of .xsd files she wishes to use in a web service. She uses them as follows:

1. Compile the .xsd files either using xbeans or jaxb compile. The result is a set of java classes packaged into a JAR together with binding configuration information that she can use. A "primary artifact". There are three styles of compilation from schema that can be done:

{{{ 1. XMLBeans style - interfaces

  1. JAXB style - interfaces
  2. JSR 101 style - classes }}}

For example, the "purchase-order" schema type might be translated into a Java class called "PurchaseOrder".

2. Use the translated Java classes as arguments in a web service, by writing a .jws file that passes them as arguments.

void submitOrder(PurchaseOrder order);

3. When deploying, do a "secondary compile" that supplies binding implementations for the classes generatred in the first step.

4. When a WSDL is generated, the relevant schema (defining PurchaseOrder in this case) is included into the generated WSDL and used in the definition of the submitOrder operation.

5. When using XMLBeans style arguments, the web service operations can be annotated to select either "full fidelity" or "lossy" binding. Lossy binding is the only option for the other styles of binding.

Start from Java

1. Write web service methods as Java classes. Write data classes as Java value objects with getters and setters.

2. Compile the java classes into a web service.

3. As a side-effect of web service compileation, the data types are turned into xml schema types. This can be done in either JSR 101 or XMLBeans style, selectable by controlling the .wsdl compiler.

Certain Java constructs may require special configuration; there is a user-supplied config file, or annotations in the Java. Users can supply custom codecs.

The result of this process is a portable jar that contains compiled java classes, xml schema types, and a binding-config.xml file.

4. When deployed, again, a deployed JAR is created that may include more code.

Deployment is allowed to fail if not all required artifacts are present. For example, if the binding-config.xml specifies that lossless binding is to be permitted, then the Java classes can be checked to verify that they follow a specific pattern and implement a specific interface.

Annotations on the Java class support: {{{ * ability to specify QName

Start from both

1. Take existing compiled classes and WSDLs or schemas that were generated from the pre-deployment steps of any of the above scenarios. In other words, when starting from both, you begin by working with the source code for a "portable JAR".

2. Modify either the Java or the WSDL by adding extra methods or by making minor modifications to the methods that are there. Commonly signatures will not change at all.

However, if there are changes in signatures, then the developer must modify binding-config.xml by hand to match the changes.

3. Run the deployment compiler, supplying both the Java and the WSDL or schema, and the binding-config.xml.

4. If the compiler detects any mismatch, it will complain. Otherwise, the compiler will build a deployable JAR.

Feature Grid

Start from java, start from schema, JSR 101, 109, JAXB, XMLBeans, fast binding, and lossless binding are all supported. But not all at the same time. The following grid describes some of the relationships:

|                  | JSR 101 | JSR 109 | JAXB  | XMLBeans | Portable JAR  |
| Fast binding     |    Y    |    Y    |   Y   |    Y     |      Y        |
| Lossless binding |    ?    |    ?    |   N   |    Y     | if configured |
| to tylar src     | from J  | from J  | from s| from s   | n/a           |
| to start-from J+X|    Y    |    Y    |   Y   |    Y     |      Y        |

Components

The following diagram onlines some of the major functional components that need to be developed.

    +-----------------------------+              +----------------------------+ 
    |  Java metadata access API   |              | SOM layer (abstracts XSD ) | 
    +-----------------------------+              +----------------------------+ 

 +-------------------------+   +-------------------------+   +-------------------------+ 
 | 101/109 start-from-java |-+ | JAXB start-from-schema  | +-| XMLBeans start-from-xsd | 
 +-------------------------+ | +-------------------------+ | +-------------------------+ 
                             |              |              | 
 +-------------------------+ |              v              | +-------------------------+ 
 | 101/109 start-from-xsd  |-+->+-----------------------+<-+-| "Start from both" helper| 
 +-------------------------+    | BINDING CONFIG FILE   |    +-------------------------+ 
                                +-----------------------+ 
 +-------------------------+      |          |         |     +-------------------------+ 
 | Java code generation    |<-----+          |         +---->| Schema generation       | 
 +-------------------------+                 |               +-------------------------+ 
                                             | 
                                             v 
                +------+         +--------------------+         +-----+ 
                | Java |-------->|  Binding deployer  |<--------| XSD | 
                +------+         +--------------------+         +-----+ 
                                    |                | 
                                    |                | 
                                    v                v 
              +--------------------------+      +------------------------+ 
              | Fast runtime serializer  |      | Lossless runtime       | 
              +--------------------------+      +------------------------+ 
              | Fast runtime deserializer| 
              +--------------------------+ 

Binding-config.xml file {{{ * Detail exactly what's in this file

JAM {{{ * Abstracts Java reflection information

SOM {{{ * Abstracts XSD

101/109 binding compiler frontend {{{ * Start with Java plus 109 jaxrpc-mapping file

JAXB binding compiler frontend {{{ * Start with JAXB annotated schema

XMLBeans binding compiler frontend {{{ * start with schema plus config file

"Start from both" helper frontend {{{ * start with both schema and java that have a very straightfoward relationship

Java generation backend {{{ * start with binding-config.xml containing only easy-java types

Schema generaton backend {{{ * start with binding-config.xml containing only easy-schema types

Binding deployer {{{ * start with Java + schema + binding-config.xml

Fast runtime deserializer {{{ * Given input "rich 173 stream"

Fast runtime serializer {{{ * Given a graph of objects

Lossless runtime runtime {{{ * Given objects that know how to synchronize

Possible implementation order

We'll need to figure out what work needs to be done and organize it in order and parcel it out. For first-cuts of the implementation, here is an initial list.

  1. what types are specification (binding-config.xml)
  2. 101 start from schema compiler
  3. lossy deserializer
  4. lossy serializer
  5. 101 start from java compiler
  6. JAXB start from schema
  7. deserializer (supporting JAXB)
  8. serializer (supporting JAXB)
  9. XmlBeans start from schema

  10. Lossless runtime
  11. Firm up the "start from both utility"
  12. Loop back and finish everything.

XmlBeansV2BindingArch (last edited 2009-09-20 23:31:33 by localhost)