Describe Digester/FAQ/XmlRulesOrNotXmlRules here.

Overview

The Digester object needs a set of rule objects in order to define how to process the input xml. These can be configured using the API:

  digester.addObjectCreate("some/path", ...);
  digester.addSetProperties("some/path");

or via an xmlrules file:

  Digester digester = DigesterLoader.createDigester(rulesURL);

where rulesURL refers to a file looking like this:

<digester-rules>
  <pattern value="address-book">
    <pattern value="person">
      <object-create-rule classname="Person"/>
      <set-properties-rule/>
   ...

Supporters of xmlrules

A number of articles recommend the use of xmlrules, including:

A number of books published about jakarta commons projects also recommend xmlrules.

The original authors of the xmlrules project obviously saw sufficient advantages to make it worth implementing the feature.

Opponents of xmlrules

One digester developer (Simon Kitching) suggests avoiding the xmlrules module and staying with the standard java API for configuring rules. No other recent digester developer has expressed any opinion on this subject.

Simon's reasons for avoiding xmlrules

While the xmlrules approach does indeed work, I'm just not convinced it brings much benefit. When I first met digester, I started using the xmlrules module because it seemed to be the "highest abstraction". But in fact I found that the xml being parsed and the java classes that are being created/populated are conceptually tightly coupled anyway. There just isn't much need to change the xml->class mapping unless the classes are being changed. And if the classes are being changed then it isn't really any more work to change a mapping defined as code than to change a mapping defined in an external rules file.

And anyone writing an application using digester will already be fluent in Java, so moving the mapping from code to external xml file doesn't make life any easier that way. In fact, I think it makes things harder; I find the API easier to comprehend than the xmlrules format. And certainly if you have any "bugs" in your mapping, then you really need to know how the rule classes work. So in summary, the learning curve is *worse* for learning xmlrules than for learning the underlying API.

And it's a nuisance for the digester library maintainers, because after adding a new feature to the API, we need to add it to xmlrules as well. And writing unit-tests for xmlrules is a nuisance too.

And xmlrules has significant overhead when processing small input files, because the xmlrules file needs to be parsed first to set up all the rules.

And there are some features you just can't access via xmlrules. One example is passing references to arbitrary java objects via the ObjectParamRule.

In addition, none of the original authors of the xmlrules module are still active in the project, so help with this module may not be as readily available as support for the plain Java API.

I can see xmlrules being useful in some situations. Maybe if writing some code-generation tool (eg from a UML diagram) then it may be easier to generate xmlrules definitions than calls to the digester API.

But in general, I think xmlrules is *harder to learn*, has runtime CPU and memory overhead, and brings no practical benefit.

And one other thing: it is very easy to add custom Rule classes when using the API; it's somewhat more complex to do so when using xmlrules (though not impossible).

Parsing xml files in multiple languages

The xml files that digester parses can be in multiple languages, eg an addressbook might look like:

  <addressbook>
    <entry>
      ...
    </entry>
  </addressbook>

or might look like:

  <foodebarbar>
    <yahdedoday>
      ...
    </yahdedoday>
  </foodebarbar>

In this case the parsing rules stay the same but the pattern-matching strings associated with those rules change. By having the rule definitions in an xmlrules file translation would be simple.

However this could also be achieved while using the java api via:

  digester.addObjectCreate(props.getProperty("ADDRESSBOOK_PATTERN", ....);

where an external properties file maps the keys to the appropriate pattern string.

  • No labels