Thoughts on necessary changes in FOP and pointers into the code
|Deletions are marked like this.||Additions are marked like this.|
|Line 15:||Line 15:|
|CB> Since fop.xconf works with characterset (CZJHMNU) and codepage (T11200) files this should be autogenerated on the fly and embedded into the AFP File. That way we don't need to change fop.xconf to work with Code Fonts XZJHMNU.
Notes on AFP Fonts
Unicode Fonts in AFP
This section describes the necessary steps to support Unicode fonts in AFP. "Unicode fonts" in AFP jargon are outline fonts (CID Keyed font (Type 0)). They define their glyphs using GCUIDs (Graphic Character UCS Identifiers), for example "U0000061" for Unicode 0x0061 (LATIN SMALL LETTER A, "a").
As an example, we're going to take the font "J-Heisei Mincho Unicode". It's contained in an AFP Resource called "CZJHMNU". The font contains a large number of glyphs identified by GCUIDs and in Type 0 format.
The font comes with a code page file "T11200" which uses double-byte UCS representation. That basically enables using UTF-16BE for character data in PTOCA.
So, how can FOP now use this?
The first thing we need is a coded font resource (BCF/ECF) called "XZJHMNU" which combines "CZJHMNU" and "T11200".
CB> Since fop.xconf works with characterset (CZJHMNU) and codepage (T11200) files this should be autogenerated on the fly and embedded into the AFP File. That way we don't need to change fop.xconf to work with Code Fonts XZJHMNU.
Later in the active environment group of the page, a Map Coded Font (MCF) is needed to reference "XZJHMNU" and scale the font to the requested size.
Finally, in PTOCA, the TRN data can simply be UTF-16BE encoded character data.
Changes in FOP for Unicode Fonts
We need a new class that represents a "CID Keyed font (Type 0)". So far, we have support for bitmap fonts (BitmapFont.java) and Type 1 outline fonts (OutlineFont.java). For the new font type, we need another subclass for org.apache.fop.afp.fonts.AFPFont. Maybe OutlineFont can even be reused since the Unicode fonts are very similar to the normal outline fonts.
Until now, we only had single-byte encodings. The actual encoding of the characters happens in org.apache.fop.afp.fonts.CharacterSet (the last few methods). For "T11200", the Java "UTF-16BE" encoding could be used (specified in the configuration as we don't build a character map from the codepage resource, yet, which would actually be cleaner and more versatile but take some more work to accomplish).
The coded font (BCF/ECF) is not implemented in FOP, yet. Maybe this whole thing here also works without a separate coded font. To date we combine font and codepage in the MCF. Not sure if this will still work for Unicode fonts but it is recommended to add support for BCF/ECF. The org.apache.fop.afp.modca.MapCodedFont should be straight-forward and similar to a normal outline font.
For the PTOCA TRN data, this biggest difference is that we're no longer encoding EBCDIC data but UTF-16BE data.