This document is based on Simon Pepping’s PageLayout/PageAndLineBreaking document, and extends and illustrates it with some usecases generated by a prototype implementation. The goal is to demonstrate the feasibility of this approach before starting the real implementation, and dealing with XSL-FO’s intricacies.
Separate Line- and Page-Breaking: the Current Approach
As a quick reminder, the current approach consists in two independent steps:
- break paragraphs into line, taking the width of the first page of the page sequence as a basis. After a paragraph is handled, its best layout is retained and boxes are created for each line. The ‘width’ of the box corresponds to the line height, which in common cases is the same for every line of the paragraph. Between each box, a penalty is inserted to allow breaking inside the paragraph, except for the first and last few lines (widows and orphans settings)
In olden times when wishing box w=12 still helped one, there lived a box w=12, penalty w=0 p=0 king whose daughters were all box w=12, penalty w=0 p=0 beautiful, but the youngest was => box w=12, penalty w=0 p=0 so beautiful that the sun itself, box w=12, penalty w=0 p=0 which has seen so much, was box w=12, penalty w=0 p=0 astonished whenever it shone in box w=12 her face. box w=12
- Once all of the line-level breakings have been made, switch to page-level breaking. This is the very same as line-breaking, since there is a conceptual equivalence between words and lines, lines and pages, paragraphs and page sequences. Take the above paragraph, rotate it 90° clockwise, and then you have your paragraphs, pages and sequences of pages (the first page being the rightmost one).
The known issue with this approach is that it is impossible to handle pages in the page sequence with different widths (this would correspond to different —forced— line heights at the line-level, which isn’t really a problem). The paragraphs were broken in an optimal way that corresponds to a certain page width; if the page width changes, the optimal layout of the paragraph changes as well.
In olden times when wishing still helped one, there lived a king whose daughters were all beautiful, but the youngest was so beautiful that the sun itself, which has seen so much, was astonished whenever it shone in her face.
With just a slight change in the width, the optimal layout is completely modified, even occupying one line less than the previous one.
Moreover, it may be difficult to represent every kind of vertical content in the box-glue-penalty model. Tables in particular are quite challenging. The current algorithm that merges parallel sequences of Knuth elements (one for each cell), is complicated, limited (it doesn’t handle elastic spaces)... and wrong! (See TableLayout/KnownProblems.)
So the apparent simplicity of having a very similar process for both line- and page-breaking introduces difficulties at other places. How can we overcome those limitations?
Mixing Page- and Line-breaking
There is a solution that is very simple (well, as long as you don’t start any implementation...): we can do both at the same time. Whenever a feasible line break is found, we can look if it also makes a feasible page break. If yes, we have to record two active nodes: one for the case where we stay on the current page, on which we will add the next line; one that corresponds to the page break. The next line will then start on a new page, possibly with a different line width. If the line break doesn’t make a feasible page break (because the page is not full yet, or because this is only the first line of the paragraph and the orphan parameter is set to two lines, etc.), then no problem, we only record one active node, such that the next line will be on the same page.
In the prototype implementation described below, some slight changes were made compared to the FOP implementation:
- a glue following a box no longer makes a legal break; this is awkward, and this complicates the implementation unnecessarily (with that prevIsBox boolean that’s so easy to forget). A penalty must now be inserted instead, which allows to achieve the same functionality. The impact on memory usage is negligeable, since it can easily be re-used in a flyweight pattern.
the notion of ‘active node’ has been replaced with ‘layout’; this is pretty much the same thing apart from the following distinction: an active node represents a break: the index of the element at which the break occurs, the number of the line/page ended by this break, etc. Whereas a layout represents a context in which the next line/page will be inserted. An active node ends line 1, a layout starts line 2. An active node points to the last element of the line, a layout points to first element that will be found on the next line (sort of). This change in the point of view seems minor but may turn out to ease things a lot.
More About Layout Contexts
The notion of context becomes very important when mixing page and line breaking, in order to avoid duplicating work. If two contexts are equivalent, page or line breaking performed in one context will also be valid in the other one.
At the page level, only the page number may matter. If, say, even pages have different widths than odd pages, then we must distinguish them. If all the pages have the same widths, then all contexts are equivalent: whether a paragraph starts page 1 or page 50, it will always have to be laid out on lines with the same width.
At the line level, this is slightly more complicated. The page number may matter, but also the offset from the top of the page. Indeed, when a new line break is found we must determine if it can end the page or not. If this is the case then the next line may have to be typeset on a page with a different width.
This notion of context (or class) is probably the most important thing in the new approach. It must be thoroughly defined in order to avoid duplicating work, but must remain correct with respect to the dynamic programming approach. If we consider as equivalent two things that are not, then we may end up with a completely wrong layout.
In the prototype, layout contexts are kept simple. Layouts are divided into page-level layouts and line-level layouts: the latter simply contain line-level progress informations (how far are we in the paragraph?) in addition to the page-level informations found in the former. At every time there is only one type of active layout: either page-level (if the currently considered element is a page-level element), or line-level (if the current element is a paragraph). See below for more details.
Layouts are stored in a common two-level hash-like structure. They are first divided with respect to their page numbers: all of the layouts corresponding to a same page are equivalent from the page-level point of vue. But this is not enough for the line level, so they are then divided with respect to the amount of length, stretch, shrink they already occupy on the page (bpd+stretch-shrink). All of the layouts having the same bpd+stretch-shrink values are equivalent from a line-level point of view.
So iterating over the layouts is not the same whether we are at the page level or the line level; in the former case, one iteration corresponds to all of the layouts having the same page number. In the latter case, one iteration corresponds to all of the layouts having the same bpd+stretch-shrink values. This is finer grain, so there are likely to be more iterations at the line level.
Block-level Knuth Elements
There is a major change between the new approach and the current one. In the current one, Knuth boxes are created for each line of a paragraph, before any page breaking is considered. In the new approach this is no longer valid! If the paragraph is typeset on two pages with different widths (and this may well be the case for paragraphs appearing after a high number of pages have already been typeset), then the lines are likely to be completely different. Worse, the total number of lines may be different too. So we have to create two different sequences of Knuth elements for each possibility. We can no longer just point to the index of the element in the sequence (the line of the paragraph) that ends the previous page break. We now must also identify the particular sequence that was chosen for this layout.
Each layout then contains a list of the elements making up this layout. For a line, this corresponds to all the words present on the line. Every time a new Knuth element is being considered, it is added to the particular list of each layout. This makes it easy to handle the suppression of glues and penalties at the beginning of a new line/page: if the element is a glue or a penalty and the list is still empty, then we simply don’t add it.
Of course, in a real-life implementation this will be necessary to optimize that, and to somehow share a single list among equivalent layouts. But this is another topic, and for a prototype we can forget this.
A starting layout representing ‘Nothing on page 1’ is created to initialize the algorithm. The main loop then iterates over page-level elements:
- if the element is a box or a glue, it is added to each active layout’s list of elements. Its length (plus stretch and shrink for a glue) are added to the progress informations of the layout
if the element is a non-infinite penalty (legal page break), the possibility of page breaks is considered:
- for each layout class (each active page number), iterate over the layouts of this class. For each layout, see if a full page is reached; if this is the case, compute the corresponding demerits and record this feasible page break.
- once all the layouts of a class have been considered, we can choose the best feasible break, if any. A new layout corresponding to the new (empty) page is created, and the selected layout is recorded as the best one for the previous page.
- if the element is a paragraph, then the line-level breaking mode is entered. All the active layouts are replaced with new ones, that are identical except that they also hold progress informations for line breaking. Then the same loop is started, this time iterating over the paragraph’s elements.
The only difference in line-breaking is after a legal break has been considered: all of the new created layouts must also be considered for page breaking. And only at this moment! Indeed, layouts of different line class may belong to the same page class. A layout corresponding to page 3 and consisting of 9 lines of a paragraph is equivalent, in the page context, to a layout for page 3 with 10 lines. So the newly created layouts are stored in a same structure that will be iterated over at the page level to find potential page breaks. These page breaks will correspond to the paragraph being split over 2 (or several) pages.
Very similar processes are performed both for page- and line-breaking, so it’s important to factorize the common behaviour. In fact we have two different entities: a breaker, an object that iterates over a list of elements searching for legal breaks; and a legal break handler, an object that, given a legal break, iterates over the active layouts and select the best breaks.
So here we have two breakers: a page-level one and a line-level one; and three legal break handlers: one for regular page-level breaks, one for line-level breaks, and one for page-level breaks inside a paragraph.
A prototype has been written in Ruby to illustrate the previous ideas. The figure below has been produced by this prototype and shows all the possible ways of layouting 2 simple paragraphs. The framed nodes correspond to finished pages. Plain arcs link a layout to the layout for the previous line/page that was chosen as the best one. Dashed arcs show other feasible but less good-looking ancestors.
The various ways of laying out two paragraphs (zoom-friendly PDF version available)
An important note, first: in the previous figure, the best final layout contains the 5-line version of the first paragraph. But you will probably find that the 4-line version actually ‘looks’ nicer (less stretched spaces). This is because the prototype has been configured with very loose settings that don’t reflect reality (spaces are allowed to stretch up to almost 8 times their normal widths!). This made it easier to get enough layout possibilities and better illustrate the process. In real life the 5-line version of the first paragraph would never make a feasible layout. Also, for simplicity widows and orphans have not been implemented. This means that it’s considered to be perfectly acceptable to have only one line of a paragraph on a page.
So we can notice on the figure that there are two ways of laying out the first paragraph: either on 4 lines or on 5 lines. We can keep those two possibilities, because even if one has higher demerits than the other one, it may allow to achieve an overall layout with lower total demerits. So when the second paragraph begins, there are two active layouts.
The 5-line version of the first paragraph allows to finish the first page with one line of the second paragraph on it (the space between the paragraphs has an optimal value of one line and may be stretched up to two lines. However the stretching is not represented on the diagram.). A new layout is then recorded and there are now two possibilities of placing the following Knuth elements: either on the first page or on the second one.
It is also possible to add one more line of the second paragraph on the first page, which makes another possible page layout. It may be interesting to note that for each page layout having two lines of the second paragraph, there is another way of making the page by choosing the 4-line version of the first paragraph instead (follow the dashed lines). But that would imply to stretch the space between the two paragraphs, whereas with the 5-line version this space can keep its natural length, which is considered to be better (again, the algorithm settings don’t reflect reality...).
And so on and so on. In the end, the preferred layout is the one that corresponds to the 5-line version, plus one line of the second paragraph on the first page. The other two possibilities are considered to be less good-looking.
The approach described here could be called ‘total-total-fit’: total fit at the line-level, total-fit at the page level, plus every possibility of laying out a paragraph is retained. This approach will be convenient for book-like document, where choosing a version of a paragraph with one more line than the optimum allows to avoid an orphan line further on (or even lead to a feasible layout, while keeping only the optimum version might not allow to find a way of laying out the pages). Of course this comes at a resource price: more memory used, more computations. But this is very easy to cut down the number of active layouts. When the end of the first paragraph is reached, we can choose to keep only the best layout (like is currently the case in FOP). In the example above this means that the 5-line version wouldn’t be retained, which removes almost half of the nodes in the graph. It’s only the matter of a setting in the algorithm.
Also, it is possible to switch to best-fit, or even first-fit, at the line level (although this is probably not desirable, even for business-type documents). For first-fit the starting layout would be removed as soon as a feasible first line has been found (here, ending on ‘wishing’). For best-fit, as soon as there is no more possibility to make line n (the adjustment ratio becomes < −1), the best one is chosen and all of the other ones are suppressed. For example, for line 2 of the first paragraph, the best one is the line that ends with ‘there lived a’. The other one (ending with ‘there lived’) would be suppressed, which again allows to reduce the number of active nodes.
So with the same infrastructure and a few parameters, this should be possible to make everyone happy: those who want speed and low memory consumption, and those who want the best layout, whatever the number of additional memory sticks they’d have to buy.
The prototype may be found in the Temp_Interleaved_Page_Line_Breaking branch. It produces the graph of active layouts on stdout in the dot format, and on stderr the list of nodes active at the end of each step, in a more or less pretty-print format (more generally stderr is used to dump any kind of debugging information). It’s best to visualize those informations with a text editor that doesn’t wrap long lines.
A PDF version of the diagram can be obtained by running the following command (dot is part of the graphviz package): dot -Tpdf -o graph.pdf graph.dot. Other available output formats are svg, png, gif, etc., see the help of the dot utility.