Object Pipelines

One common use case is unmarshalling large XML documents which consist of a regular pattern of sub-trees. For example:

<?xml version='1.0'?>
<root>
  <objects>
    <object>
       ...
    </object>
    <object>
       ...
    </object>
...
    <object>
       ...
    </object>
  </objects>
</root>

each document fragment enclosed by the object tag needs to be unmarshalled to an object then each object processed in turn. The usually way to do this is to unmarshal the whole document into a collection and then process each in turn.

DOM based binding solutions are likely to use a lot of memory since they must read the document and all the objects into memory before processing can begin.

SAX based binding solutions (such as Betwixt) have an advantage in this case. They can process each fragment in turn without having to keep the whole document in memory at any time. Unfortunately, they still need to keep all the objects in memory.

Object Pipelines

One solution is to use a SAX based binder combined with a Object pipeline. The idea is that the binder supplies each object to the consumer for processing as soon as it becomes available.

This should be pretty easy to add to Betwixt (and other binders).

  • No labels