Substitution Groups FAQ

What is the latest version of XMLBeans?

Introduction. What are substitution groups?

Substitution groups in XMLSchema extend the concept of type substitution (a.k.a. polymorphism) to XML elements. Example:

{{{<xs:element name="a" type="type"/> <xs:element name="b" type="derivedtype" substitutionGroup="a"/>}}}

This XMLSchema snippet declares two elements, named "a" and "b", and moreover specifies that "b" is in the substitution group headed by "a". This makes it possible to use "b" in every place that "a" would be valid in an instance document. Example: If {{{<container>

</container>}}} is a valid instance document, then so is {{{<container>


Note that in order for the Schema definitions to be valid, the type "derivedtype" must be derived from "type" using XMLSchema type derivation, like in:

{{{<xs:type name="type">

</xs:type> <xs:type name="derivedtype">


(one can of course use <xs:restriction> in place of <xs:extension>)

Why would one use substitution groups?

In general, there is no reason to, especially in data-oriented environments. If you use XML to encode and pass data around, there is nothing that you can achieve by using substitution groups and you can't by using normal type substitution (xsi:type).

Why is it so hard to work with substitution groups from Java?

Basically the main problem arises from the fact that while XMLSchema types map almost naturally to Java types, XMLSchema elements (and attributes) only map to JavaBeans-style properties, which don't have polymorphism. To finish drawing the parallel, it would be like being able to say in Java "I want to use calls to method x() instead of calls to method y() on every object on which y() is a legal method call". That would really be confusing, now, wouldn't it? Since this mismatch exists, it's challenging to come up with a good translation of this concept into Java.

How does XmlObject work with types/elements?

This does not relate to substitution groups in particular, but it is relevant to note that XmlObjects model Schema types, which is the natural mapping and not Schema elements. The difference is that a Schema element is a container, while a Schema type is a description of the contents of a container (that's why you can have an element and an attribute both having the same type). Therefore, an instance of an XmlObject (typed or not) represents the contents of a container (element or attribute) and not the container itself. As such, when you do MyXmlObject.Factory.newInstance() you are creating the contents of a generic XML element, waiting to be assigned, using a "setXXX()" method, somewhere in your XML document.

How do I use substitution groups in a read-only scenario?

Let's say you have declared an element named "a" and an element "b" in the substitution group headed by "a", like in the introduction. Now let's assume the following input document: {{{<container>


Then you can do (assuming normal generated XmlBeans):

{{{Type[] contents = container.getAArray(); for (int i = 0; i < contents.length; i++) {

} }}} That's all good but you can achieve the same processing semantics using the following XML document:



If you want to know whether substitution groups were used or not, your code would have to look like

{{{Type[] contents = container.getAArray(); QName b = new QName("", "b"); for (int i = 0; i contents.length; i++) {

} }}}

How do I use substitution groups in write scenarios (v2)?

Use the XmlObject.substitute(QName newName, SchemaType newType) method.

{{{Type t = ...; Derivedtype d = (Derivedtype) t.substitute(new QName("", "b"), Derivedtype.type); }}}

This method checks for compatibility between types, between the element QName and its type and for substitution group affiliations before changing the element name (note that it is possibile for newType to be different than the declared type for the element with name newName which would result in substitution groups and xsi:type being used at the same time, which is of course legal).

How do I use substitution groups in write scenarios (v1)?

Here's how to use it in a bottom-up write scenario (using the "setXXX()" methods), given the Schema definition from Introduction.

{{{ContainerDocument.Container c = ...; Derivedtype d = ...; c.setAArray(1, d); // This sets up the contents, since it is an XmlObject assignment c.getAArray(1).newCursor().changeName(new QName("", "b")); // This sets up the element name, for substitution groups}}}

Note that, since the setter is called "setAArray", the container's element name would be "a" and the type would be what's passed in, consistent with the rule that elements are mapped to JavaBeans properties and types to XmlObjects.

In a top-down write scenario (using the "addNewXXX()" methods), the following code snippet would do it:

{{{ContainerDocument.Container c = ...; Type t = c.addNewA(); t.newCursor().setName(new QName("", "b")); // This sets up the substituted name Derivedtype d = (Derivedtype) c.getAArray(1); // This retrieves the XmlObject of the new type, on which sets can be called}}}

Note that this approach too requires the use of "newCursor().setName()" and involves getting back from the parent an object of the new, derived type.

Why doesn't XmlBeans generate additional methods for every possible substitution, to support substitution groups?

This has to do with what kind of information is discoverable at compile-time. Suppose that we change the example provided in "Introduction" such that elements "a" (the substitution group head) and "b" (the substitution group member) are defined in different files. You can either compile both files in a single compilation step, or you can compile the file containing "a" first, and then the file containing "b" (if "a" would be useful in its own right, for example). When XmlBeans generates the setter for the uses of "a" (first compilation) it does not yet know that there is a substitution group involved, so no extra methods are generated. When it compiles "b" (second compilation), it notices that a substitution group for "a" is defined, but it's too late to change "a", since it's classed have already been generated and may be in use somewhere else. This contrasts with the scenario where both files are compiled at the same time, where additional methods could be generated. In summary, the generated methods would depend upon the order in which the source Schema documents were compiled, which would be strange at the very least. In Java, the contents of a compiled .class do not depend on what other sources were compiled at the same time, so it would not feel right to do that in Schema. And still the case in which the file containing "a" was compiled first would not be solved. So, we would have an anomaly in behavior that only helps solving half of the problem; better without it.

Why doesn't XmlBeans figure out automatically when I call "setXXX()" which substitution group it should use?

That is a good question, but there is a good answer for it too. First of all, when calling a setter and passing in an XmlObject of a specific type, it is very possible to use type substitution instead of substitution groups (i.e. use xsi:type) to achieve the same effect and a valid document. So there will need to be an option telling XmlBeans "use substitution groups whenever possible". Even then, there could exist multiple elements having the same type. So fine, XmlBeans should pick whatever element in the substitution group of the declared element having as type the run-time type that has been passed in to setXXX(). Unfortunately, the question "find me all elements in a given substitution group" is not answerable at run-time. Because one can always drop additional jars on the classpath containing new types and elements, some of these elements can augment the existing substitution groups. Since a Java ClassLoader can answer questions like "find me resource x", but can't answer questions like "enumerate all resources of this kind", one can never build a list of all available SchemaTypes (or Elements) that have a certain property. I think that the convenience and consistency of SchemaTypes behaving like JavaTypes with respect to classloading and compile-time vs. run-time behavior far outweighs the minor inconvenience of having to write one extra line of code when using substitution groups :-) Note. There is the case of anonymous types, which don't have the two properties discussed above: they don't work with type substitution and they can only be contained in a single element, to which they are local. So when anonymous types are used, one could generate the substitution needed, but then that would contrast to the case where types are not anonymous, so it's not at all clear that it would be a good idea to default to substitution groups when anonymous derived types are used.

SubstGroupsFaq (last edited 2009-09-20 23:34:18 by localhost)