Differences between revisions 3 and 4
Revision 3 as of 2005-06-26 15:46:20
Size: 26737
Comment: Avoid duplication
Revision 4 as of 2009-09-20 23:52:38
Size: 26740
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Back to ["FOPProjectPages"] Back to [[FOPProjectPages]]
Line 10: Line 10:
API issues were already discussed in thread starting [http://marc.theaimsgroup.com/?l=fop-dev&m=103117324025889&w=2 here], [http://marc.theaimsgroup.com/?l=fop-dev&m=102874145831791&w=2 here] and [http://marc.theaimsgroup.com/?l=fop-dev&m=102288081931411&w=2 here]. API issues were already discussed in thread starting [[http://marc.theaimsgroup.com/?l=fop-dev&m=103117324025889&w=2|here]], [[http://marc.theaimsgroup.com/?l=fop-dev&m=102874145831791&w=2|here]] and [[http://marc.theaimsgroup.com/?l=fop-dev&m=102288081931411&w=2|here]].
Line 125: Line 125:
public class ["FOProcessorFactory"] public class [[FOProcessorFactory]]
Line 160: Line 160:
public class ["FOProcessorConfiguration"] { public class [[FOProcessorConfiguration]] {
Line 345: Line 345:
Alternative API proposal by (jm) under [wiki:FOPAvalonization/AltAPIProposalJM]. Alternative API proposal by (jm) under [[FOPAvalonization/AltAPIProposalJM]].

Back to FOPProjectPages


FOP Avalonization and API Redesign

The purpose of this document is the redesign of the API visible to embedders and how to apply Avalon interfaces and concepts.

Author(s): JoergPietschmann (pij), JeremiasMaerki (jm)

Minor Contributor(s): VictorMote (wvm)

API issues were already discussed in thread starting here, here and here.

API requirements

Former content here moved to the ApiRequirements page.

Startup Concepts Proposal (wvm)

The class names below are descriptive only, although I think they would be useful in an implementation.

  • <code>Session</code>. There should be exactly one <code>Session</code> object. It controls one or more <code>Document</code> objects.

  • <code>Document</code>. Can control one or more <code>RenderContext</code> objects.

  • <code>RenderContext</code>. Manages one or more <code>PageSequence</code> objects. Can control zero or more <code>RenderType</code> objects. Our current <code>RenderContext</code> concepts are <code>NativeFOP</code>, <code>AWT</code>, and <code>StructureRender</code>.

  • <code>RenderType</code>.

The primary benefit sought here is that all static variables now have a comfortable dynamic object in which to live instead. For example, a collection of fonts objects might be stored in the <code>Session</code> object where it can be reused, while a similar collection of fonts used in a document might be stored in the <code>Document</code> object where it can be used to control font embedding in a <code>Renderer</code>.

Separation of Concerns and control also have a better structure with this approach. The <code>Document</code> controls the process of building the FO Tree. As each <code>PageSequence</code> in the FO Tree is completed, control is returned to <code>Document</code>. <code>Document</code> might then (for example) look at the <code>RenderContext</code> objects that it needs to manage. If it only sees one, it might then send that page sequence directly to the <code>RenderContext</code>. If it sees more than one, it might instead first serialize the <code>PageSequence</code> so that it can be reused. (This is a simplistic and perhaps not good example, but illustrates a flow of control that I think would be useful). A <code>RenderContext</code> manages the layout process and controls the <code>RenderType</code> objects, again making decisions about when a page (with "eager" rendering), or a page-sequence (with "patient" rendering) should be sent to the <code>Renderer</code>, and whether it should be thrown away or saved for another <code>Renderer</code>.

There are opportunities here for meta-information about FOP's processing. A <code>RenderContext</code> might, for example keep a collection of <code>PageSequenceMeta</code> objects that includes information about each <code>PageSequence</code> (even though the real <code>PageSequence</code> object may already be recycled). Page numbering is a potentially useful example.

Each break in this scheme is an opportunity for clean multithreading. Depending on user instructions (and perhaps other information available to FOP), <code>Session</code> could start multiple <code>Document</code> objects, each in a separate thread. <code>Document</code> could start multiple <code>RenderContext</code> objects, each in a separate thread. <code>RenderContext</code> could do the same with <code>Renderer</code> objects.

As far as keeping track of context, the <code>PageSequence</code> becomes critical. It should contain convenience methods <code>getSession()</code>, <code>getDocument()</code>, and <code>getRenderContext()</code>. Within the FO Tree building and Layout processes, the <code>PageSequence</code> is the only thing that needs to get passed around. All other context information can be derived conveniently from it. (Alternatively, FObj could have a method, inherited by all subclasses, that finds the ancestor <code>PageSequence</code>, or it could simply add this as a variable to each FObj-subclassed object when created. The point is that you can conveniently get to the data.)

The above point has possible implications for logging as well. Again, the <code>PageSequence</code> can track whatever logger it should be using, and perhaps have a method to do the logging. So anytime I am near an FObj subclass & want to log something, I do: <code>SomeFObj.getPageSequence().logMessage(DEBUG_MSG, "Here is a debug message");</code>. This could also be implemented in the <code>RenderContext</code>, or if we don't need logging separated between <code>RenderContext</code> objects, in a higher object. I think all you need to tote around in each class then is an import statement to whichever class this is implemented in.

Alternatively (and this might be better practice), we can simply tote the <code>RenderContext</code> everywhere. At least everything would be encapsulated in it, so that you don't have to carry multiple things around for multiple purposes (fonts, images, etc.).

I don't understand Avalon well enought yet to know how or whether the above scheme fits in with it. However, I think the above hierarchy would serve us well, whether as Avalon services or some other implementation.

As an afterthought, it seems that the <code>RenderContext</code> concept is the same as, or has a 1-to-1 mapping with the concept of a <code>LayoutStrategy</code>. <code>LayoutStrategy</code> is a topic for another proposal, but in summary it gives the ability to the user to choose among several layout strategies to be used to layout a document.

Here is an example of the code needed to startup a FOP processing session: <code>

    session = new Session(); 
    document = session.addDocument(inputFile); 
    document.addRenderType(RENDER_PDF, LAYOUT_SIMPLE, outputFile1); 
    document.addRenderType(RENDER_POSTSCRIPT, LAYOUT_SIMPLE, outputFile2); 
    document.addRenderType(RENDER_PDF, LAYOUT_CLASSIC, outputFile3); 
    document.addRenderType(RENDER_PRINT); 
    document.process() 
        -OR- 
    session.process()  // if you want session to manage a queue of documents 

</code>

If desirable, the <code>Session</code>, <code>Document</code>, and <code>RenderType</code> classes can have "set" methods for various parameters, so that the constructors are simplified.

From a big-picture control standpoint:

    Document = FOTree 
    RenderContext = AreaTree 
    RenderType = Renderer 

The <code>document.addRenderType()</code> not only creates needed <code>RenderType</code> objects, but also creates any <code>RenderContext</code> objects that are needed, three in this example (the first two can share the same <code>AreaTree</code>, each of the others requires a different one). When <code>document.process() is run, it looks at the <code>RenderContext</code> objects to determine whether an <code>FOTree</code> needs to be built. In this case it does. It can then loop through the <code>RenderContext</code> objects to see what if any layout work needs to be done, and build an <code>AreaTree</code> based on the output type and the selected <code>LayoutStrategy</code>. Each <code>RenderContext</code> object will then loop through the <code>RenderTypes</code> which are attached to it to fire up Renderers. So in the example above, the same <code>AreaTree</code> will be used to spit out the first two <code>RenderTypes</code> before trying to build the <code>AreaTree</code> needed for the "CLASSIC" layout. The user doesn't really need to know about <code>RenderContext</code> -- its purpose is to facilitate reuse of a constructed <code>AreaTree</code>.

Of course, there are a number of configuration options available as well, all of which can be attached to the appopriate object by a servlet programmer. The actual using of those options is in other objects (and indeed, should be in other packages), but the /control/ mechanism can live in these four classes. Please note mainly that <code>RenderType</code> is not the same as a Renderer, but handles control information for the Renderer so that it doesn't have to be tightly coupled with other packages (specifically layout).

API proposal

Discussion points below. <code>{{{ package org.apache.fop.api;

public class FOPException     extends org.apache.avalon.framework.CascadingException {  }


package org.apache.fop.api;

public class FOProcessor {{{ implements org.apache.avalon.framework.configuration.Configurable,

  • implements org.apache.avalon.framework.logger.LogEnabled { private org.apache.avalon.framework.logger.Logger logger; private org.apache.fop.fo.FOTreeBuilder foTreeBuilder; private org.apache.fop.api.Renderer renderer; // add configuration data public FOProcessor() { } public void configure(org.apache.avalon.framework.configuration.Configuration configuration)

    } public void enableLogging(org.apache.avalon.framework.logger.Logger logger) { } public void render(org.apache.fop.api.Source source, org.apache.fop.api.Renderer renderer)
    • throws FOPException {
    } public void setRenderer(org.apache.fop.api.Renderer renderer) { }

    public org.xml.sax.ContentHandler getContentHandler() { } // specific configuration methods public void setSourceResolver(org.apache.avalon.excalibur.SourceResolver resolver) { } public org.apache.avalon.excalibur.SourceResolver getSourceResolver() { } public void setImageResolver(ImageResolver resolver) { } public ImageResolver getImageResolver() { } public void setFontResolver(FontResolver resolver) { } public FontResolver getFontResolver() { } // convenience methods to clone processors public void setConfiguration(FOProcessorConfiguration configuration) { } public ["FOProcessorConfiguration"] getConfiguration() { } }}}

}


package org.apache.fop.api;

public class FOProcessorFactory {{{ implements org.apache.avalon.framework.configuration.Configurable {

  • public static ["FOProcessorFactory"] newInstance() { } public void configure(org.apache.avalon.framework.configuration.Configuration configuration) } public FOProcessor newProcessor() { } public FOProcessor newProcessor(org.apache.fop.api.Renderer renderer) { } // specific configuration methods

    public void setSourceResolver(org.apache.avalon.excalibur.SourceResolver resolver) { } public org.apache.avalon.excalibur.SourceResolver getSourceResolver() { } public void setImageResolver(ImageResolver resolver) { } public ImageResolver getImageResolver() { } public void setFontResolver(FontResolver resolver) { } public FontResolver getFontResolver() { } // convenience methods to clone processors public void setConfiguration(FOProcessorConfiguration configuration) { } public ["FOProcessorConfiguration"] getConfiguration() { } }}}

}


package org.apache.fop.api;

public class FOProcessorConfiguration { {{{ private org.apache.avalon.excalibur.SourceResolver sourceResolver;

  • private ImageResolver imageResolver; private FontResolver fontResolver; private String baseURL; private string logLevel; // more fields to add }}}

}


package org.apache.fop.api;

// input source abstraction public interface Source {}


package org.apache.fop.api.helper;

public class AbstractSource implements Source { {{{ java.lang.String getSystemId() {}

  • void setSystemId(java.lang.String systemId) {} }}}

}


package org.apache.fop.api.stream;

public class StreamSource extends AbstractSource { {{{ StreamSource() {}

  • StreamSource(java.io.File f) {} StreamSource(java.io.InputStream inputStream) {} StreamSource(java.io.InputStream inputStream, java.lang.String systemId) {} StreamSource(java.io.Reader reader) {} StreamSource(java.io.Reader reader, java.lang.String systemId) {} StreamSource(java.lang.String systemId) {} java.io.InputStream getInputStream() {} java.lang.String getPublicId() {} java.io.Reader getReader() {} void setInputStream(java.io.InputStream inputStream) {} void setPublicId(java.lang.String publicId) {} void setReader(java.io.Reader reader) {} void setSystemId(java.io.File f) {} }}}

}


package org.apache.fop.api.sax;

public class SAXSource extends AbstractSource { {{{ SAXSource() {}

  • SAXSource(org.xml.sax.InputSource inputSource) {} SAXSource(org.xml.sax.XMLReader reader, org.xml.sax.InputSource inputSource) {} org.xml.sax.InputSource getInputSource() {} org.xml.sax.XMLReader getXMLReader() {} void setInputSource(org.xml.sax.InputSource inputSource) {} void setXMLReader(org.xml.sax.XMLReader reader) {} }}}

}


package org.apache.fop.api.dom;

public class DOMSource extends AbstractSource { {{{ DOMResult() {}

  • DOMResult(org.w3c.dom.Node node) {} DOMResult(org.w3c.dom.Node node, java.lang.String systemID) {} org.w3c.dom.Node getNode() {} void setNode(org.w3c.dom.Node node) {} }}}

}


package org.apache.fop.api;

public interface Renderer {{{ implements org.apache.avalon.framework.configuration.Configurable,

  • implements org.apache.avalon.framework.logger.LogEnabled { public void configure(org.apache.avalon.framework.configuration.Configuration configuration)

    public void enableLogging(org.apache.avalon.framework.logger.Logger logger); }}}

-- Get Specific Renderer Type -- {{{ public org.apache.fop.renderer.StructureRenderer getStructureRenderer();

  • public org.apache.fop.renderer.PrintRenderer getPrintRenderer(); public org.apache.fop.renderer.AWTRenderer getAWTRenderer(); }}}

-- OR --     public org.apache.fop.apps.LayoutHandler getLayoutHandler();  }


package org.apache.fop.api;

public interface ImageResolver { {{{ public org.apache.fop.image.Image resolve(String base, String href, String mimeType, HashMap genericParameters)

  • throws FOPException; }}}

}


package org.apache.fop.api;

public interface FontResolver { {{{ public org.apache.fop.font.Font resolve(String base, String fontBase, String fontName, )

  • throws FOPException; }}}

}


package org.apache.fop.api.helper;

public class ImageResolverImpl { {{{ public org.apache.fop.image.Image resolve(String base, String href, String mimeType, HashMap genericParameters)

  • throws FOPException {
  • } }}}

}


package org.apache.fop.api.helper;

public class FontResolverImpl { {{{ public org.apache.fop.font.Font resolve(String base, String fontBase, String fontName, String fontStyle, String fontWeight,

  • String fontType, String metricsURI, HashMap genericParameters)

  • throws FOPException { }}}

}


Configuration DTD (note this is syntactiaclly invalid because of the endline pseudo comments) <!ELEMENT configuration (fonts?,core?,renderer*)> <!ELEMENT fonts (font*)> <!ELEMENT font (font-triplet+,parameter*)> <!ATTLIST font {{{ name CDATA #REQUIRED -- some sort of base name (see issues)

  • type CDATA #IMPLIED -- TTF, ... base-uri CDATA #IMPLIED -- where to find font related ressources, if relative resolved against font-base-uri metrics-uri CDATA #IMPLIED -- override location of FOP specific metrics file embed (yes|no) yes

    kerning (yes|no) #REQUIRED> }}}

<!ELEMENT font-triplet (parameter*)> <!-- map FO font families to configured font --> <!ATTLIST font-triplet {{{ name CDATA #REQUIRED -- FO font family name to map

  • style CDATA #REQUIRED

    weight CDATA #REQUIRED> }}}

<!ELEMENT core parameter*)> <!ATTLIST core {{{ hyphenation-base-uri CDATA #IMPLIED -- where to look for hyphenation data, must be absolute

  • font-base-uri CDATA #IMPLIED -- where to look for font related ressources, must be absolute source-base-uri CDATA #IMPLIED log-level CDATA #IMPLIED logger-class CDATA #IMPLIED -- may be inappropriate to have here source-resolver-class CDATA #IMPLIED image-resolver-class CDATA #IMPLIED font-resolver-class CDATA #IMPLIED xml-reader-class CDATA #IMPLIED -- default: the JAXP default entity-resolver-class #IMPLIED -- default: the JAXP default

    > }}}

<!ELEMENT renderer (parameter*)> <!ATTLIST renderer {{{ name CDATA #REQUIRED

  • mime-type CDATA #IMPLIED> }}}

<!ELEMENT parameter (key,(value | list))> <!ELEMENT key (#PCDATA)> <!ELEMENT value (#PCDATA)> <!ELEMENT list (value|subentry)+> <!ELEMENT subentry (key,value+)>

  • }}}</code>

Renderer specific configuration settings

  • PDF renderer: stream-filter-list, PDF metadata, PDF encryption related options
  • Text renderer: encoding
  • Area tree renderer: detail-level

Usage examples

<code>

// render a FO file into a PDF file, using all defaults
{{{    FOProcessor processor=FOProcessorFactory.newInstance.newProcessor(); 
    processor.render(new org.apache.fop.api.stream.StreamSource(new File("stuff.fo")), 
                     new PDFRenderer(new File(stuff.pdf))); 


// render a DOM into a PDF file, using all defaults {{{ FOProcessor processor=FOProcessorFactory.newInstance.newProcessor();

  • processor.render(new org.apache.fop.api.dom.DOMSource(mydocument),


// render XML+XSLT into a PDF file, using all defaults {{{ FOProcessor processor=FOProcessorFactory.newInstance

  • newProcessor(new PDFRenderer(new File(stuff.pdf)));
  • Transformer transformer=TransformerFactory.newInstance()

    • newTransformer(new javax.xml.transform.stream.StreamSource(new File("stuff.xsl")));

    transformer.transform(new javax.xml.transform.stream.StreamSource(new File("stuff.xml")),

    • new javax.xml.transform.sax.SAXResult(processor.getContentHandler());
  • }}}</code>

Alternative API proposal by (jm) under FOPAvalonization/AltAPIProposalJM.

Resolver specification

Stream source resolving

An Excalibur source resolver is used to resolve a pair (base URL,href URL) into a source stream. The resolver is by default invoked on:

  1. the processor source URL, with a null base
  2. the URL taken from a src property of a fo:external-graphic, with the base URL being the system id of the input document or null
  3. the URL taken from a src property of a fo:color-profile, with the base URL being the system id of the input document or null
  4. an URL synthesized from the hyphenation language, country and script properties, with the base URL being the configured hyphenation-base-URI
  5. an URL synthesized from the font name, type and, depending on the type, various other properties, with the base URL being the configured font-base-URI

If the URI to be resolved is a relative URI and the base URI is null, the default resolver will, hopefully, fall back to resolving against the currrent working directory.

A custom resolver can be used to provide caching, connection pooling, access control and support for custom URL schemes.

Image resolving

An image resolver is used to resolve a pair (base URL,href URL) into a FOP image object. The resolver is invoked each time an image is about to be rendered. In particular the resolver is called each time an image in static content or from within a retrieved marker is rendered.

The default implementation uses a simple cache and is guaranteed to return the same image object for the same resolved URI until the cache is explicitely reset. This will be used by renderers for output formats which can share ressources used by images in different output locations in order to conserve space, if possible.

The default implementation uses the source resolver to resolve the (base,href) pair into a source with an associated resolved URI. The resolved URI is then used to look up the image object in a global cache. If there is no image object for the resolved URI, the source is read once, if necessary the image format type is detected, and a FOP image object is created.

If the source resolver does not attach an absolute URI to the resolved source, no cache lookup is performed and the source is read unconditionally.

A custom resolver can be used to provide customized caching at the image object level, support for custom image formats and support for non-stream input, for example SAX input for SVG images. A custom image resolver is not oblieged to use the configures source resolver.

Font resolving

A font resolver is used to resolve a font base URI and a bunch of font defining properties into a FOP font object. In contrast to the image resolver the font resolver is invoked exactly once per processor run for a given tuple of font properties.

The default font resolver uses the configured source resolver to look up various font-type specific ressources related to the font. The base URI is the configured font-base-uri, the href URI is synthesized from the font name and, depending on the type, a variety of other properties and type dependent data, like file extensions. It may also use the font lookup facilities provided by the underlying graphic environment, thereby probably ignoring any font-base-uri.

A custom resolver can be used to provide custom caching independent of processor runs or access to custom font formats.

Hyphenation data and color profile resolving

There is no specific hook for resolving hyphenation data or color profiles at the object level. Only the configured source resolver is used. The source resolver is invoked exactly once per processor run for each (base,href) pair pointing to a color profile or hyphenation data ressource.

Entity resolving

The source resolver is not used for entity resolving. The user is responsible to configure the system for using customized entity resolvers in source XML streams. This may include overriding the default source resolver.

Discussion points

  • Convenient configuration, both for the FOP core and the renderer [3]
    • [3] I think it would be better if the core and the renderer wouldn't be looked at separately. IMO a renderer is just a component/service in FOP's core. And we've also got { { { StructureRenderers } } } that also shouldn't be forgotten. (jm)

    • (pij) strongly disagree, see below
  • Hooks for users to plug in customized URL resolvers, caching mechanisms and possibly other stuff. [4]
    • [4] Yes, probably either through the simple API wrapper AND the central FOP configuration. (jm)

    • (pij) added class names to config.dtd
  • SVG might be specifically provided as SAX event stream. This means we need either a (parser,InputSource) pair or a j.x.transform.Source. Question: should the InputSourceResolver generally provide a pair (parser,InputSource)? What should be returned as parser for binary sources, like GIF images: null or a NullParser?

    • (pij) Is this point moot after all?
  • Finding fonts:
    • Variant 1: @base-uri points to the main ressource for the font (base-uri="file:///C:/WinNT/fonts/ARIAL.TTF"). URIs of other ressources are somehow derived, for example strip the extension and add another.

    • Variant 2: @base-uri + @name + (@type dependent extensions) point to the font related ressources (base-uri="file:///C:/WinNT/fonts" name="ARIAL" type="true-type" -> look for file:///C:/WinNT/fonts/ARIAL.ttf and file:///C:/WinNT/fonts/ARIAL.xml)

    • The user can provide generic parameters to the default resolver in order to find ressources with non-standard extensions or to override individual mappings.
  • Use a parameter object to the resolve() methods for FontResolver and ImageResolver instead of individual parameters? Seems to be easier to extend.

  • Is a single base URI sufficient for hyphenation, font etc.? Should we be able to configure a search path containing multiple URIs (spec can be lifted from Ant). What about the source base URI?
  • Do we need a hyphenation resolver?
  • We have a service reader for the element mapping (at least in the maintenance branch as well as a default XML configuration. I suggest to drop both and compile the default configuration from some source XML into a Java class.
    • In trunk there is no default XML config the values will be in the code(kll)

    • The service reader is for automatic extension loader, what is the replacement?(kll)

Rationales

Rationales for separating renderer and FO processor

  1. Different renderers have drastically different configuration data. While the Avalon configureation interface can easily cope with this, it doesn't seem appropriate for a simple, more Java Beans like configuration interface.
  2. The AWT renderer is even more different because it doesn't have an output stream.
  3. XML/SVG renderers might want to provide SAX and DOM results, which aren't output streams either.
  4. The renderer can be seen as an abstraction of the FO processor output.
  5. A structure renderer, if necessary, can provided by the specific renderer.
  6. Actually, the current attempt to hide the renderer from the user brought us quite a few of <code>if (renderer instanceof ...</code>

Rationale for generic parameters for font and image resolvers

  1. Per URI specific data, provided by extension attributes or similar means, for example for cache control or authorisation data.
  2. Additional global data provided by extension elements
  3. Additional global data provided by generic configuration data

FOPAvalonization (last edited 2009-09-20 23:52:38 by localhost)