Revision 1 as of 2005-04-01 20:45:20
converted to 1.6 markup
|Deletions are marked like this.||Additions are marked like this.|
|Line 5:||Line 5:|
|'ALTLinux' (see [http://lingucomponent.openoffice.org/hyphenator.html]).||'ALTLinux' (see [[http://lingucomponent.openoffice.org/hyphenator.html]]).|
|Line 29:||Line 29:|
|The [http://linux.org.mt/projects/jtextcheck/index.html JTextCheck framework],||The [[http://linux.org.mt/projects/jtextcheck/index.html|JTextCheck framework]],|
Failure to use OOo hyphenation patterns in FOP
I have made an effort to use the hyphenation pattern files of OpenOffice.org in FOP. The format of these files seems to be called 'ALTLinux' (see http://lingucomponent.openoffice.org/hyphenator.html). In this format the first line contains the encoding of the file. Each following line contains a pattern, exactly as in the pattern element in a FOP XML hyphenation pattern file. ALTLinux files do not have classes or exceptions.
Parsing the format and building a HyphenationTree object for it was not difficult. But when I used the result, there was no hyphenation. This turns out to be due to the absence of classes.
A class is a set of characters that are equivalent with respect to hyphenation. Almost all classes consist of a lower case and the corresponding upper case character. FOP has a second use of the classes besides equivalence. All characters listed in a class are considered as letters, all other characters as non-letters. A word with a non-letter is not hyphenated. An ALTLinux hyphenation pattern file does not define letters, and therefore there is no hyphenation.
All West-European languages have a-z and A-Z as letters. But they differ in their definition of letters in the accented character range. Russian and other languages with Cyrillic script, of course, deviate completely from this template. Therefore it does not seem feasible to supply a definition of letters in the programming code.
The JTextCheck framework, with the OOo hyphenation plugin, obviously is able to work with the OOo patterns. It delivers the hyphenation points for the words in a string of text. That means that it provides a service that is rather similar to that of FOP's hyphenation code, and it should be possible to use this framework instead of or in parallel with FOP's hyphenation code. But I do not see a sufficient need to justify the coding effort needed to make this work.