Differences between revisions 1 and 2
Revision 1 as of 2005-03-22 05:54:03
Size: 16076
Editor: anonymous
Comment: missing edit-log entry for this revision
Revision 2 as of 2009-09-20 23:52:34
Size: 16076
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
Back to ["FOPProjectPages"]. Back to [[FOPProjectPages]].
Line 29: Line 29:
 * resolve whether the { { { ["FontBBox"], StemV, and ItalicAngle } } } font metric information is important or not -- if so, parse the .pfb (or .pfa file) file to extract it when building the FOP xml metric file (Adobe Type 1 fonts only) ''[1]''  * resolve whether the { { { [[FontBBox]], StemV, and ItalicAngle } } } font metric information is important or not -- if so, parse the .pfb (or .pfa file) file to extract it when building the FOP xml metric file (Adobe Type 1 fonts only) ''[1]''
Line 44: Line 44:
   * Answer 1: Many of our customers use FOP in a so-called "headless" server environment -- that is, the operating system is operating in character mode, with no concept of a graphical environment. We need some mechanism of allowing these environments to get font information.["12"]    * Answer 1: Many of our customers use FOP in a so-called "headless" server environment -- that is, the operating system is operating in character mode, with no concept of a graphical environment. We need some mechanism of allowing these environments to get font information.[[12]]
Line 151: Line 151:
   * But the Type 1 spec defines a multiple master extension allowing more than one flavour of a font in one font file (ex. a regular and a bold font). I think, the same is possible for TrueType. To handle fonts like this in a family/style/weight manner, we would need a facade that points to a particular flavour of a multiple master font, so one such font would result in multiple facades point to it. [9]["10"]
   * These fonts are generally scalable (at least AWT, T1, TT and OT are). What if we need to support fixed-size fonts for a text, PCL or Epson LQ renderer?["11"]
   * But the Type 1 spec defines a multiple master extension allowing more than one flavour of a font in one font file (ex. a regular and a bold font). I think, the same is possible for TrueType. To handle fonts like this in a family/style/weight manner, we would need a facade that points to a particular flavour of a multiple master font, so one such font would result in multiple facades point to it. [9][[10]]
   * These fonts are generally scalable (at least AWT, T1, TT and OT are). What if we need to support fixed-size fonts for a text, PCL or Epson LQ renderer?[[11]]
Line 176: Line 176:
["10"] With regard to the design aspect, when a font object is requested by the layout classes, the same object should always be returned for the same basic information that is passed. (wvm) [[10]] With regard to the design aspect, when a font object is requested by the layout classes, the same object should always be returned for the same basic information that is passed. (wvm)
Line 178: Line 178:
["11"] A fixed-size font can be thought of as merely an instance at a specific point size of a typeface. So, for text, we probably need to take the first point size passed to us & use that throughout, spitting out an error if other point sizes are subsequently used. For the others, I suppose we have to first resolve the issues of whether/how to support hardware fonts. (wvm) [[11]] A fixed-size font can be thought of as merely an instance at a specific point size of a typeface. So, for text, we probably need to take the first point size passed to us & use that throughout, spitting out an error if other point sizes are subsequently used. For the others, I suppose we have to first resolve the issues of whether/how to support hardware fonts. (wvm)
Line 180: Line 180:
["12"] I don't think this is really a strong argument unless we refrain from using Batik for SVGs too. (pij) [[12]] I don't think this is really a strong argument unless we refrain from using Batik for SVGs too. (pij)

This page was copied from http://xml.apache.org/fop/dev/fonts.html. Once we're done here the content should be put back to XML and published on the website.

Back to FOPProjectPages.

Authors:


Glossary

output handler: A set of classes making up an implementation of an output format (i.e. not just the renderer, but for example the PDF Renderer plus the PDF library)

rendering run: One instance of the process of converting an XSL:FO document to a target format like PDF (or multiple target formats at once)

rendering instance: One instance of the process of converting an XSL:FO document to exactly one target format. Note that there may be multiple rendering instances that are part of one rendering run.

typeface: A set of bitmap or outline information that define glyphs for a set of characters. For example, Arial Bold might be a typeface. This concept is often also called a "font", which we have defined somewhat differently (see below).

typeface family: A group of related typefaces having similar characteristics. For example, one typeface family might include the following typefaces: Arial, Arial Bold, Arial Italic, and Arial Bold Italic. A typeface family may be named in a way ambiguous with its members -- for example, the family mentioned in the previous sentence might also be named "Arial".

font: A typeface rendered at a specific size. For example -- Arial, Bold, 12pt.

Goals

  • refactor existing font logic for better clarity and to reduce duplication
  • The design should be in concert with the considerations for Avalonization
  • parse registered font metric information on-the-fly (to make sure most up-to-date parsing is used??)
  • resolve whether the { { { FontBBox, StemV, and ItalicAngle } } } font metric information is important or not -- if so, parse the .pfb (or .pfa file) file to extract it when building the FOP xml metric file (Adobe Type 1 fonts only) [1]

  • handle fonts registered at the operating system (through AWT)
  • handle fonts that are simply available on the target format (Base 14 fonts for PDF and PostScript, Pre-installed fonts for PCL etc.)

  • Support various file-based font formats:
  • Allow for font substitution [3]
  • We probably have to support fixed-size fonts for several renderers: Text, maybe PCL, Epson LQ etc.
  • Optional: Make it possible to use multiple renderers in one run (create PDF and PS at the same time)
    • How important is that?

Issues

  • Why are we using our own font metric parsing and registration system, instead of the AWT system provided as part of Java?
    • Answer 0: We must handle default fonts for the target format, like the standad PDF fonts or default PS printer fonts, which may not be available either from the system/AWT nor as a file.
    • Answer 1: Many of our customers use FOP in a so-called "headless" server environment -- that is, the operating system is operating in character mode, with no concept of a graphical environment. We need some mechanism of allowing these environments to get font information.12

    • Answer 2: At some level, we don't yet fully trust AWT to handle fonts correctly. There are still unresolved discrepancies between the two systems.
  • What about fonts for output formats using the structure handler (RTF, MIF)? Do they need access to the font subsystem? [4]
  • Supporting multiple output formats per rendering run has a few consequences.
    • The layout (line breaks, line height, page breaks etc.) is influenced by font metrics. Two fonts with the same name (but one TrueType and one Type 1) may have different font metrics, thus leading to a different layout. [5]

    • The set of available fonts cannot be provided by the renderers anymore. A central registry is needed. A selector has to decide which fonts are available for a set of renderers to be used. [6]
    • Two renderers (although using the same area tree) may produce slightly different looking output.
    • What to do when a font is not available to one of the say two target output formats? Or what to do when a font is available from two font sources but each output handler supports only one of these (and the font metrics are different)? [7]
  • Font subsitution: PANOSE comes to my mind. What's that exactly? [8] Can we use/implement that?

Design

Concern areas

There are several concern areas within FOP:

  • provision of font metrics to be used by the layout engine
  • registration and embedding of fonts by the output handlers (such as PDF)
  • Management of multiple font sources (file-based, AWT...)
  • Selection of fonts (fonts that can be used in a rendering run, substituted fonts)
  • Parsing of file-based fonts ({ { { Type 1, TrueType, OpenType etc. } } })

Thoughts

Central font registry

{ { { Until now each renderer set up a FontInfo object containing all available fonts. Some renderers then used the font setup from other renderers (ex. PostScript the one from PDF). That indicates that there are merely various font sources from which output handlers support a subset. } } }

{ { { So the font management should be extracted from the renderers and made standalone, hence the idea of a central font registry. The font registry would manage various font sources (AWT fonts, Type 1 fonts, TrueType fonts). Then, there's an additional layer needed between the font registry and the layout engine. I call it the font selector (for now), because it determines the subset of fonts available to the layout engine based on the set of output handlers (one or more) to be used in a rendering run. It could also do font substitution as the layout engine looks up fonts and selects the most appropriate font. If a font is available from more than one font source the font selector should select the one favoured by the output handler(s). This means that we need some way for the output handlers to express a priority on font sources. } } }

The font selector will have to pass to the output formats which fonts have been used during layout.

Common Resources

There are some possible efficiencies to be gained by using one FOP session to generate multiple documents, and by generating multiple output formats for one document. However, to accomplish this, we probably need to explicitly distinguish which resources are available to / used by a Session, a Document (rendering run), and a Rendering Instance.

Proposed Interface for Fonts (wvm)

We wish to design a unified font interface that will be available to all other sections of FOP. In other words, the interface will provide all needed information about the font to other sections of FOP, while hiding all details about what type of font it is, etc.

 { { {   /** {{{ * Hidden from FOP-general

    */ }}}

package class TypeFaceFamily

/** {{{ * Hidden from FOP-general.

  • Implementations of this for AWT, Base14 fonts, Custom, etc.
  • All implementations of this interface are hidden from FOP-general

    */ }}}

package interface TypeFace     package getEmbeddingStream() {} 

public class Font {{{ private TypeFace typeface;

  • /**
    • Consults list of available TypeFace, and Font objects in the Session,

    • and returns appropriate Font if it exists, otherwise creates it, updating Session
    • and Document lists to keep track of available fonts and embedding information
    • /
    public static Font provideFont(Session session, String family, String style, int weight, int size) {}

    public static StreamOfSomeSort getFontRendering(Document document) {} }}}

/** some accessor methods **/ {{{ String getTypeFace();

  • String getStyle(); int getWeight(); int getSize(); }}}

/** The following methods are either computed by the implementations of the TypeFace {{{ interface, or are computed based on information returned from the TypeFace interface

  • */
    • //These methods already take font size into account int getAscender(); int getDescender(); int getCapHeight(); //more... int getWidth(char c); boolean hasKerningAvailable(); //more... } } } }}}

Jeremias and I (wvm) have gone around a bit about whether the Font should be a class or an interface. The place where the interface is needed is at the { { { TypeFace } } } level, which is where the differences between AWT, { { { TrueType } } } custom, { { { PostScript } } } custom, etc. exist. The rest of FOP needs only the following:

  • the provideFont() method to obtain a Font object which can provide metrics information
  • a way to get the actual font for embedding purposes. This is done through the { { { FontFamily } } } interface, which has methods for returning needed embedding information. However, the interface is never exposed to FOP-general. Instead a static method is (in Font) to handle all of that.

  • collections of fonts used, fonts to be embedded, etc. are stored in the Session and Document concept objects respectively, where they can be obtained by { { { getFontRendering() } } }

So while the interface for { { { TypeFace } } } is good (to handle the many variations), any attempt to expose it to FOP-general makes FOP-general's interaction with fonts more complex than it needs to be.

Hardware fonts

Definition: "hardware" fonts are fonts implicitly available on a target platform (printer or document format) without the need to do anything to make them available. Examples: PDF and { { { PostScript } } } specifications both define a Base 14 font set which consists of Helvetica, Times, Courier, Symbol and ZapfDingbats. PCL printers also provide a common set of fonts available without uploading anything.

The Base 14 fonts are already implemented in FOP. We have their font metrics as XML files that will be converted to Java classes through XSLT. The same must be done for all other "hardware" fonts supported by the different output handlers. The layout engine simply needs some font metrics to do the layout.

OpenType Fonts

OpenType font files come in three varieties: 1) TrueType font files, 2) TrueType collections, and 3) Wrappers around CFF (PostScript) fonts. The first two varieties can be treated just like normal TTF and TTC files. The fonts metric information for all three is stored in identical (i.e. TrueType) tables. The CFF files require a small amount of additional logic to unwrap the contents and find the appropriate tables.

How to define a "font" in various contexts

A "font" can have various definitions in different contexts:

  • The layout engine needs a font defined as: font-family (Helvetica), font-style (oblique), font-weight (bold), font-size (12pt). In this context the layout engine will also use information like text-decoration (underline etc.).
  • Font sources will probably deal with fonts defined by: font-family, font-style, font-weight. But there are things to consider:
    • Type 1 fonts normally define a font exactly this way (Example: The files FTR_.pfb and FTR_.pfm together define the "Frutiger 55 Roman" font).

    • But the Type 1 spec defines a multiple master extension allowing more than one flavour of a font in one font file (ex. a regular and a bold font). I think, the same is possible for TrueType. To handle fonts like this in a family/style/weight manner, we would need a facade that points to a particular flavour of a multiple master font, so one such font would result in multiple facades point to it. [9]10

    • These fonts are generally scalable (at least AWT, T1, TT and OT are). What if we need to support fixed-size fonts for a text, PCL or Epson LQ renderer?11


[1] It is important. Because these values are used to create the font descriptor in PDF. If these values are wrong you get error messages from Acrobat Reader. (jm) See http://xml.apache.org/fop/fonts.html where this issue is discussed in a note. If I understand the note correctly, we need to read in the pfb file to get this information. (wvm)

[2] Actually OpenType is the unified next generation font format that also replaces Type 1 Fonts. An OpenType font contains more information than either a Type 1 or TrueType font contains. It can wrap either of the two methods for describing the font outline information. (wvm)

[3] Please define what is meant by this term. Are we talking about which font to use when we can't find the one that is requested? (wvm)

[4] I am not sure about RTF, but I think that MIF will need some access to this information. (wvm) <wvm date="20030718"> OK, I think I was wrong about this. The StructureRenderers should be able to get everything they need about fonts directly from the XSL-FO input. If they need to aggregate similar fonts, or track which ones have been used, they should do that themselves.</wvm>

[5] My thought is that this should never happen. If the font registry is centralized, then when "XYZ, bold, 12pt" is requested, the same font should be selected every time. (wvm)

[6] I envision this information to be stored in the appropriate objects -- Session, Document, or RenderingInstance (these are concepts, not class names, because classes filling these concepts may already exist). Session should either be static or a singleton, and includes a list of all fonts (actually probably typefaces) used in this session. Document may not need to list anything, but RenderingInstance needs to know which fonts need to be embedded, among other things. (wvm)

[7] I think I am against allowing this. We would first need to first resolve how to register hardware fonts & get their metric information, which seems almost impossible. Then we would have to build a mechanism that maps font sources to output media -- PDF can use software fonts, but not hardware. PCL can use hardware, and depending on the printer, perhaps can use downloadable software fonts as well. This seems like an ugly, slippery slope, at least for a 1.0 release. I think it better to say that we support only soft fonts, and let the user build a workaround. The other really ugly aspect of this is that if you allow two different fonts to be used for two different rendering contexts, I think you have to have two different area trees to handle the layout differences. (wvm)

[8] See http://www.w3.org/Fonts/Panose/pan2.html. (wvm)

[9] Actually, multiple masters are used to generate specific instances of .pfm and .pfb files, so these live in separate files. We do have an issue with .cff (Compact Font Format, which contain multiple Type 1 faces), and .ttc (TrueType Collection, which contain multiple TrueType faces). Until we have parsing tools, these are really unusable to us. OpenType fonts have native support for multiple typefaces within a font file, and I think they support both of these formats. (wvm)

10 With regard to the design aspect, when a font object is requested by the layout classes, the same object should always be returned for the same basic information that is passed. (wvm)

11 A fixed-size font can be thought of as merely an instance at a specific point size of a typeface. So, for text, we probably need to take the first point size passed to us & use that throughout, spitting out an error if other point sizes are subsequently used. For the others, I suppose we have to first resolve the issues of whether/how to support hardware fonts. (wvm)

12 I don't think this is really a strong argument unless we refrain from using Batik for SVGs too. (pij) Response from wvm: <wvm> I don't understand this comment. The point is that we use our own registry for fonts because it is the only way to get font information in a headless environment. </wvm>

FOPFontSubsystemDesign (last edited 2009-09-20 23:52:34 by localhost)