Differences between revisions 11 and 12
Revision 11 as of 2008-08-04 14:16:12
Size: 11943
Comment: Adding a link to the new image handling proposal
Revision 12 as of 2009-09-20 23:52:19
Size: 11953
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
Some of the content below is slightly out-dated. The "current problems" are mostly "past problems" now, after the [http://xmlgraphics.apache.org/commons/image-loader.html image loader framework in XML Graphics Commons] has been introduced. Performance and memory consumption has been improved as expected. Still, the image handling in the various renderers is still done in different ways. As an example: Barcode4J currently makes calls against the Graphics2DAdapter interface, the ImageAdapter interface, the PSRenderer class and can still use the fallback via SVG. The coupling is too high. The PDFRenderer also still has a slightly different approach at image handling than, say, the PSRenderer. Now, with the new intermediate format, all the code that is directly dependent on the Renderer interface becomes a problem for code reuse. Some of the content below is slightly out-dated. The "current problems" are mostly "past problems" now, after the [[http://xmlgraphics.apache.org/commons/image-loader.html|image loader framework in XML Graphics Commons]] has been introduced. Performance and memory consumption has been improved as expected. Still, the image handling in the various renderers is still done in different ways. As an example: Barcode4J currently makes calls against the Graphics2DAdapter interface, the ImageAdapter interface, the PSRenderer class and can still use the fallback via SVG. The coupling is too high. The PDFRenderer also still has a slightly different approach at image handling than, say, the PSRenderer. Now, with the new intermediate format, all the code that is directly dependent on the Renderer interface becomes a problem for code reuse.
Line 13: Line 13:
Another problem is with color spaces. Details [http://www.nabble.com/Quality-of-included-image-degrades-when-png-rendering-is-used-tf2312286.html here]. Another problem is with color spaces. Details [[http://www.nabble.com/Quality-of-included-image-degrades-when-png-rendering-is-used-tf2312286.html|here]].
Line 17: Line 17:
The following matrix tries to show all the possible combinations. A more graphical view of the whole thing can be seen [http://people.apache.org/~jeremias/fop/image-redesign-conversions.svg here] (SVG, 45KB). The following matrix tries to show all the possible combinations. A more graphical view of the whole thing can be seen [[http://people.apache.org/~jeremias/fop/image-redesign-conversions.svg|here]] (SVG, 45KB).
Line 140: Line 140:
||AFP||direct painting to Graphics2D||See [http://issues.apache.org/bugzilla/show_bug.cgi?id=41995 Bugzilla #41995] for an alternative using BCOCA|| ||AFP||direct painting to Graphics2D||See [[http://issues.apache.org/bugzilla/show_bug.cgi?id=41995|Bugzilla #41995]] for an alternative using BCOCA||
Line 161: Line 161:
   * To handle all kinds of formats, we may need a special !PushBackInputStream which supports arbitrarily sized buffers to reset a stream to position 0. PNG is a case where the resolution information of an image is not guaranteed to be within the first 4KB of the file. See also [http://www.nabble.com/RereadableInputStream-tf4609782.html here]. The alternative (now chosen) is to use ImageIO's ImageInputStream which allows caching of content already read either in memory or in a temp file. Specialized implementations (like a ThresholdImageInputStream, switching from memory to file when a certain limit is reached) could be implemented if optimizations are needed.    * To handle all kinds of formats, we may need a special !PushBackInputStream which supports arbitrarily sized buffers to reset a stream to position 0. PNG is a case where the resolution information of an image is not guaranteed to be within the first 4KB of the file. See also [[http://www.nabble.com/RereadableInputStream-tf4609782.html|here]]. The alternative (now chosen) is to use ImageIO's ImageInputStream which allows caching of content already read either in memory or in a temp file. Specialized implementations (like a ThresholdImageInputStream, switching from memory to file when a certain limit is reached) could be implemented if optimizations are needed.

This page documents aspects around images in Apache FOP. Older design documentation can be found here: http://xmlgraphics.apache.org/fop/design/images.html

Current status

Some of the content below is slightly out-dated. The "current problems" are mostly "past problems" now, after the image loader framework in XML Graphics Commons has been introduced. Performance and memory consumption has been improved as expected. Still, the image handling in the various renderers is still done in different ways. As an example: Barcode4J currently makes calls against the Graphics2DAdapter interface, the ImageAdapter interface, the PSRenderer class and can still use the fallback via SVG. The coupling is too high. The PDFRenderer also still has a slightly different approach at image handling than, say, the PSRenderer. Now, with the new intermediate format, all the code that is directly dependent on the Renderer interface becomes a problem for code reuse.

Unification of image handling is being worked on as part of the implementation of the new intermediate format (AreaTreeIntermediateXml/NewDesign). The proposal is described on the ImageSupport/ImageHandler page.

Current problems

Certain renderers can embed images directly (JPEG, EPS and certain TIFF subformats, for example) but other renderers still require a decoded bitmap image. Currently, the cache only provides the first requested variant of an image. If the PDF renderer rendered an FO file with a JPEG image and then the same document is rendered with the Java2DRenderer, there will be a problem because JPEGImage loaded the original data and did not decode the JPEG image.

Another problem is with color spaces. Details here.

Format support matrix

The following matrix tries to show all the possible combinations. A more graphical view of the whole thing can be seen here (SVG, 45KB).

JPEG

fo:external-graphic only

Renderer

required/preferred variant

Comments

PDF

1:1 embedding

PostScript

1:1 embedding

Requires PostScript Level 2

Java2D

decoded bitmap

PCL

decoded bitmap

AFP

decoded bitmap

SVG

referenced or RFC2396 data URL

RTF

1:1 embedding

1:1 embedding through FOP's own code. No support for decoding JPEG images through an image library, yet.

PNG

fo:external-graphic only

Renderer

required/preferred variant

Comments

PDF

decoded bitmap, possibly 1:1 embedding

PostScript

decoded bitmap, possibly 1:1 embedding for PS Level 3

Java2D

decoded bitmap

PCL

decoded bitmap

AFP

decoded bitmap

SVG

referenced or RFC2396 data URL

RTF

1:1 embedding

BMP

fo:external-graphic only

Renderer

required/preferred variant

Comments

PDF

decoded bitmap

PostScript

decoded bitmap

Java2D

decoded bitmap

PCL

decoded bitmap

AFP

decoded bitmap

SVG

referenced or RFC2396 data URL

RTF

1:1 embedding

GIF

fo:external-graphic only

Renderer

required/preferred variant

Comments

PDF

decoded bitmap

PostScript

decoded bitmap

Java2D

decoded bitmap

PCL

decoded bitmap

AFP

decoded bitmap

SVG

referenced or RFC2396 data URL

RTF

decoded bitmap

TIFF

fo:external-graphic only

Renderer

required/preferred variant

Comments

PDF

decoded bitmap

1:1 embedding for CCITT encoded images

PostScript

decoded bitmap

1:1 embedding for CCITT encoded images (NYI)

Java2D

decoded bitmap

PCL

decoded bitmap

AFP

1:1 embedding for CCITT encoded images

SVG

referenced or RFC2396 data URL

RTF

decoded bitmap

1:1 embedding through the help of Batik's TIFF codec plus FOP's own code.

SVG

fo:instream-foreign-object and fo:external-graphic

Renderer

required/preferred variant

Comments

PDF

native conversion with Batik

PostScript

native conversion with Batik

Java2D

native conversion with Batik

PCL

conversion to bitmap with Batik

HP/GL Graphics2D implementation only for the simplest of SVGs available

AFP

conversion to bitmap with Batik

GOCA implementation in the works

SVG

referenced or embedded

RTF

conversion to bitmap with Batik

Output formats (like PCL and RTF) for which no native conversion is available we need an alternative to provide the SVG as a bitmap image. This is currently implemented in AbstractGenericSVGHandler and, for RTF, in SVGConverter.

For PDF, it would be interesting to have a native picture painted into a Form XObject so such an image can be preprocessed and more easily reused. The difficulty there are features like links which would need to be handled separately since they are not part of a Form object.

Similary for PostScript, the SVG could be rendered as an EPS file which could be reused within the document.

EPS

fo:external-graphic only

Renderer

required/preferred variant

Comments

PDF

embedded

PDF support is deprecated and not supported by newer Acrobat Readers

PostScript

embedded

Java2D

not supported

PCL

not supported

SVG

not supported

RTF

not supported

If we ever have a PostScript interpreter available to FOP we can support EPS images for other output formats. An alternative could be to extract the TIFF previews provided by certain EPS images but this is better solved through a better suited image format.

The FOray project has the beginnings of a PostScript interpreter with a proof-of-concept implementation for rendering graphics. But making it usable would take a lot of work.

MathML

fo:instream-foreign-object and fo:external-graphic

MathML is internally converted to SVG in the MathML extension and subsequently handled as such. So see the SVG section for details. Same problems, too. The alternative is to render MathML directly using Java2D.

One small issue here: a math expression usually has a baseline. This baseline should be aligned with the FO baseline.

Barcode4J

fo:instream-foreign-object only

Renderer

required/preferred variant

Comments

PDF

painted using Java2D or internally converted to SVG

PostScript

internally converted to EPS

Java2D

direct painting to Graphics2D

PCL

direct painting to Graphics2D

AFP

direct painting to Graphics2D

See Bugzilla #41995 for an alternative using BCOCA

SVG

internally converted to SVG (NYI)

RTF

internally converted to bitmap

The new FOP extension is available since Barcode4J 2.0alpha1.

Other foreign XML formats

The easiest way is to convert to SVG internally and let the renderers handle that format. Examples for this section: Example plan extension, JCharts support etc.

Better is to have those extension work directly on Java2D which enables to bypass Batik and speeds things up.

Requirements for the whole solution

  • Extensions which support foreign XML formats should be able to convert their content at least to SVG. Generating bitmaps is also desirable so output formats like RTF can also be supported. Rendering to Java2D may be preferred to SVG as it can reduce some overhead.
  • Renderers need to expose APIs to output directly supported formats other than bitmap formats. EPS for PostScript, Graphics2D for at least Java2DRenderer but possibly also for PDF and PS. See Graphics2D!ImagePainter/Graphics2DAdapter as an example for a solution which is already implemented for some renderers.

  • Different renderers support different source formats/flavors for the images to be embedded. The current cache only supports exactly one flavor. If the same image is rendered with another renderer this might result in problems.
    • The image cache should store one entry per URI and flavor.
    • Examples of possible flavors are: raw/undecoded, !RenderedImage/!BufferedImage, Graphics2D!ImagePainter, EPS, XML (SVG, MathML...), RFC2396 URL, etc.

    • Renderers would provide a prioritized list of supported/preferred flavors. The image package would then do necessary decoding/conversions and deliver the best flavor it can deliver in a particular case.
  • During layout only the image dimensions need to be determined so the image can be properly placed. The actual image data only needs to be available during rendering. Ideally, the InputStream to load the dimensions from should be available to the component fully loading the image later on to avoid additional round-trips to fetch the image. Should additional flavors be needed, the InputStream can probably be reopened.

    • To handle all kinds of formats, we may need a special PushBackInputStream which supports arbitrarily sized buffers to reset a stream to position 0. PNG is a case where the resolution information of an image is not guaranteed to be within the first 4KB of the file. See also here. The alternative (now chosen) is to use ImageIO's ImageInputStream which allows caching of content already read either in memory or in a temp file. Specialized implementations (like a ThresholdImageInputStream, switching from memory to file when a certain limit is reached) could be implemented if optimizations are needed.

    • When someone works with the XML-based intermediate format to represent the area tree, the layout and the rendering might happen in different VMs, so the actual image data might actually never be loaded at all, so the InputStream still needs to be closed properly!

    • We may have to make some distinction between fast and slow connections. It could be faster simply reopening the stream if a resource is loaded from a local file in which case we can rely on the operating system's cache. For resources loaded over the (Inter)net, local buffering makes sense be that in memory or even in a temporary file (like ImageIO likes to do).
    • All the little modules should be dynamically registerable. The current hard-coding in ImageFactory is bad.

    • We need transparent support for GZIPped content (like SVGZ).
    • Support baseline adjustments (for MathML).
    • Optional: Some people told is in the past that they do dynamic image generation (like for charts and such). We told them to implement this in a servlet but it could be worthwhile (if easily implemented), that a plug-in could generate a dynamic image based on a given URI in some flavor (SVG, bitmap...).
    • Optional: If possible, the package should be able to change between multiple implementations of the same document format while parsing. We've had cases where ImageIO gave better results than our internal codecs and vice versa.

Random thoughts

It might be good to separate the image dimension object for the layout process from the actual decoded image, thus providing separate caches for both. [DONE]

For high-volume PostScript environments (or PPML) it might be worthwhile not to fully load images at all but to simply insert resource placeholders (DSC comment %%IncludeResource) into the stream. This would speed up the rendering process considerably for environments where such an approach is possible. [DONE]

If it is known which renderer the document will be rendered to during the layout stage, the images could be loaded in a separate thread after the dimensions have been determined while the actual layout continues. [ignored for now]

We need to get rid of our byte array approach for storing decoded images. This should be done entirely using Java2D/AWT means, i.e. RenderedImage/BufferedImage. [DONE]

ImageSupport (last edited 2009-09-20 23:52:19 by localhost)