WhiteSpace Handling in the XSL FO spec

Some thoughts about the concerns

The FO spec must address the following three concerns:

  1. What to do with linefeed characters in the input: consider as space or as a real linefeed?
  2. What to do with XML white space characters other than linefeed in the input: preserve or collapse?

These two concerns are governed by the properties linefeed-treatment and white-space-collapse.

Together these two items address the matter of pretty printing of XML documents (in this case FO documents).

  1. What to do with white space and other eligible characters around line breaks?

This concern is governed by the properties white-space-treatment and suppress-at-linebreak.

XML itself has a prescription for dealing with white space in the input XML file: The parser must report whether white space occurs in element content or not, allowing applications to ignore it in element content; in SAX terms, white space in element content is ignorable white space.

Because FO does not have a DTD or schema, there is no element content, and all white space is passed on to the FO processor. FO does have its own equivalent of element content. When white space occurs in flow objects which do not take PCDATA as children, it is ignored by the FO processor. White space in flow objects that take PCDATA children, however, must be taken into account. Its interpretation is governed by the first two items.

Pretty printing can also occur inside PCDATA. Editors commonly break long stretches of text into separate lines, substituting space characters with linefeed characters. They also commonly indent the lines to illustrate the nesting position of the element containing the PCDATA, replacing single spaces with sequences of spaces and tab characters. The above two concerns also undo those pretty printing effects on the output of the FO processor.

The first two items are concerned with input. Therefore they can in principle be taken care of at the refinement stage.

The third item is concerned with input characters whose representation depends on the layout, viz., which are suppressed when they occur before and/or after a line break. Therefore it can only be taken care of when the line breaks are known, i.e. at the layout or area building stage.

The formulation of this concern was flawed in version 1.0 of the FO spec. Instead of line breaks, it mentions line feed characters. This is clearly not what is needed. Users expect white space to be suppressed around line breaks, and FO processors do this, even though the spec has no good prescription for this behaviour. Version 1.1 of the FO spec tries to correct this. But the result is a mixed behaviour of the property white-space-treatment. Two of its values refer to input characters and can be taken care of at the refinement stage, the other three refer to suppression as a result of layout and must be taken care of at the layout or area building stage.

Remarks on white-space-collapse

white-space-collapse is formulated in terms of flow objects, so that it only applies to direct siblings. This can give rise to undesirable effects. Examples:

  1. Spaces before an fo:inline and spaces at the start of an fo:inline are not collapsed, perhaps contrary to the expectation of the user.
  2. fo:marker elements may have spaces at their start and end, which may become adjacent to spaces before and after the fo:retrieve-marker that inserted the fo:marker content. These spaces are not collapsed, again perhaps contrary to the expectation of the user.

The user would prefer to think in terms of collapsing of adjacent white space glyph areas. The comments of the XSL editors have made it clear, however, that white-space-collapse is strictly interpreted in terms of sibling flow objects. On the other hand, they do not make it clear why they place white-space-collapse handling at the area building stage. As a result the user must be careful not to add extra white space to inline content.

Remarks on white-space-treatment and white-space-collapse

The values ignore and preserve of white-space-treatment would better be combined with white-space-collapse into a new property, called something like white-space-treatment, with three values ignore, collapse and preserve as follows:

The property with the remaining values then could be called something like around-line-break. Unfortunately, the remaining three values have linefeed in their name, where linebreak is intended.

XslFoWhiteSpaceHandling (last edited 2009-09-20 23:52:43 by localhost)