AreaTreeIntermediateXml/NewDesign

This page describes the new proposed design for FOP's intermediate format. The goals can be found on the parent page.

Basic ideas

Sketching out a new XML format

<document xmlns="http://xmlgraphics.apache.org/fop/metafile" xmlns:xlink="http://www.w3.org/1999/xlink">
  <header>
    <x:xmpmeta xmlns:x="adobe:ns:meta/">
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
        <rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/">
          <dc:title>New Intermediate Format Demo Document</dc:title>
        </rdf:Description>
      </rdf:RDF>
    </x:xmpmeta>
    <bookmarks....
      [PDF bookmarks]
    </bookmarks>
  </header>
  <page index="1" name="1">
    <page-header>
      <ps:ps-setup-code>%FOPTestPSSetupCode: General setup code here!</ps:ps-setup-code>
    </page-header>
    <content>
      <box transform="translate(5000, 6000)" width="18000" height="10000">
        <font family="Helvetica" style="normal" weight="400" variant="normal" size="12000"
          color="black"/>
        <text x="1233" y="803" dx="0 0 20 0 0">Hello</text>
        <draw-rect x="1233" y="1200" width="20000" height="20000" fill="yellow" stroke="none"/>
        <box transform="translate(1233, 1200)" width="20000" height="20000" clip="true">
          <image xlink:href="myimage.svg" x="0" y="0" width="20000" height="20000"/>
        </box>
      </box>
      [..]
    </content>
  </page>
  <page...
</document>

Other needed elements

Old design (for reference)

http://people.apache.org/~jeremias/fop/renderer-design-old.png

New design

http://people.apache.org/~jeremias/fop/renderer-design-new.png

IFPainter design

IFPainter (working title, better suggestions welcome!) is a central interface, like Renderer. There's one implementation for each output format that is useful in the context of the intermediate format (probably includes all current renderers except text. most important are: PostScript, AFP and PCL). Ideally, the IFPainter interface is a direct equivalent to the possible SAX stream for the new IF format, i.e. it is possible to convert between IFPainter and the IF-NG SAX stream with no losses. The IFContentHandler in the graphic above would convert the SAX stream to IFPainter calls and a special IFPainter implementation used by IFRenderer could convert the calls to the SAX stream. That way, the IFRenderer could actually render to an IFPainter without the detour over XML.

The IFPainter interface is not fully designed, yet, so the following is just to give an idea what it could look like (all methods will probably throw SAXException):

public interface IFPainter {

    void setUserAgent(FOUserAgent userAgent);
    void setResult(Result result);
    boolean supportsPagesOutOfOrder();

    void startDocument();
    void endDocument();

    void startDocumentHeader();
    void endDocumentHeader();

    void startPageSequence(String id);
    void endPageSequence();

    void startPage(int index, String name, Dimension size);
    void endPage();

    void startPageHeader();
    void endPageHeader();

    void startPageContent();
    void endPageContent();

    void startPageTrailer();
    void addTarget(String name, int x, int y);
    void endPageTrailer();

    void startBox(AffineTransform transform, Dimension size, boolean clip);
    void startBox(AffineTransform[] transforms, Dimension size, boolean clip);
    //For transform parsing, Batik's org.apache.batik.parser.TransformListHandler/Parser can be reused
    void endBox();

    void setFont(String family, String style, Integer weight, String variant, Integer size, String color);
    //All of setFont()'s parameters can be null if no state change is necessary
    void drawText(int x, int y, int[] dx, int[] dy, String text);
    void drawRect(Rectangle rect, Paint fill, Color stroke);
    void drawImage(String uri, Rectangle rect); //external images
    void startImage(Rectangle rect); //followed by a SAX stream (SVG etc.)
    void endImage();

    void handleExtensionObject();
    //etc. etc.
}

public class IFState {

    //all font traits
    //list of transforms since the last state safe (by startBox())
    //maybe the effective clip shape
}

//additional needed classes
public class IFSerializer implements IFPainter {

    public IFSerializer(ContentHandler handler) {
    [..]

    //convert IFPainter calls to XML (IF-NG)
}

public class IFContentHandler implements ContentHandler {

    public IFContentHandler(IFPainter painter) {
    [..]

    //convert SAX stream calls to IFPainter calls

}

Note that the IFPainter should be designed so it is easy to write some kind of filter (like FilteredOutputStream) where implementors can react to certain events like startPageContent() so they can add their own content calls (content enrichment) for things like barcodes, OMR marks, background images etc.

Performance evaluation compared to previous approach

Performance is expected to be higher for the following reasons:

Resource Management (Idea)

Some formats like PostScript and AFP require special processing to optimize resources (images, fonts etc.). The PostScript renderer currently supports an optional two-pass approach where the resources are only added in the second pass to the beginning of the PostScript file, i.e. after you know which resources are needed. The idea now is to enrich the IF renderer with a mechanism to track track used resources so a second pass can be avoided when producing the final output format. After all, the IF renderer already processes the full document and knows which resources are necessary.

We can make three categories of renderers:

  1. no resource optimization (Java2D-based output formats, PCL, Text, SVG)

  2. implicit resource optimization (PDF due to its object structure, Mars)

  3. explicit resource optimization (PostScript, AFP)

Please note, that this mechanism is only useful to the third category, so the mechanism will not be enabled unless done so explicitely. For formats like PDF this processing is not necessary since resources are added as they are needed.

We define a listener interface that receives notification of resource usage (ResourceUsageListener). The RenderingResource interface will be implemented by a handful of classes (at the beginning: FontResource & ImageResource). The object identity of these classes is defined by what the IF supports:

We need some infrastructure to keep track of resource usage on page-, page-sequence- and document-level. The most important is page-level. The other levels just summarize the accumulated data. Keeping track of resources down to page-level serves the following purposes:

The resource usage information can optionally be integrated into the IF. For this purpose the IF is extended by a structure that is inserted into the page trailer and the document trailer. If desired the IF renderer could also support writing a separate file parallel to the generated IF, if the information needs to be tracked somewhere (could be implemented as a special resource listener).

Subformat:

<resource-usage>
  <font family="Arial" weight="normal" style="normal" variant="normal" size="10pt" count="1"/>
  <image uri="http://xmlgraphics.apache.org/fop/images/logo.jpg" count="1"/>
</resource-usage>

On a side-note, resource counting will allow to only move resources to the document resources which are needed on more than one page. Resource on the target device can become too high if every resource is moved to the document resources unconditionally.

This whole idea adds some complexity but will make it possible to avoid a two-pass approach for PS and AFP generation which causes a reduction of through-put.

TODO

Comments

[1] JM: I wonder how far we'll need to go for tagged PDF. Do people only need the basic structure of the document (separating headers from flow content, maybe indicating block roles (para, title, footnote, ...) through extension attributes on fo:block? Or does someone need the whole tagged PDF feature set which basically allows you to embed many original semantics from the original FO document to be present in the PDF (spaces, indents, baseline shifts, alignment, etc.)?

[2] AC: Taking a SVG 1.2 subset (Tiny?) format, how slow would Batik be as a final renderer for FOP? JM: as already indicated in the first place: quite slow. The process: build-up of DOM tree, build-up of GVT tree, rendering GVT tree to Graphics2D, conversion of Graphics2D calls to final format. Too many steps in between. I suspect it would even be slower than today's solution. What we need is something that can be streamed and processed on the fly without building up too many intermediate structures in memory. BTW, Tiny is already too powerful for what we need. And I really don't intend to write a second Batik. If we decide to use an SVG subset, it would be an "SVG Nano". :-) I'll try to formulate a minimal format first trying to stay as close to SVG as possible. From there, we can check if this can be fully mapped to SVG elements. The risk is not needing all of SVG's features but needing many extensions which could again make the parsing to slow because of the growing complexity.

last edited 2008-07-25 08:18:22 by JeremiasMaerki