This page contains notes on the initial PDF accessibility implementation in Apache FOP.


Testing PDF accessibility (Tagged PDF's)

Common requirement for tagged PDF's and PDF Accessibility

(R1) The documents logical structure has to be included into the PDF file (see PDF Reference 1.4 section 9.7).

Additional requirements for accessible PDF's

Screen readers require additional information in tagged PDF files in order to read documents aloud. To enable proper vocalization PDF supports the following features:

(R2) Providing textual descriptions for images (see PDF Reference 1.4 section 9.8.2, "Alternate Descriptions")

(R3) Specifying the natural language used for text in a PDF document - for example, as English or German (see PDF Reference 1.4 section 9.8.1, "Natural Language Specification"). An accessible PDF document should include the document's default language which applies to all text in a PDF document. The language can be set on descendant elements by overriding the document's language, but FOP does not currently carry over that information to the PDF output.


The initial implementation is for PDF output only, as this is the only currently implemented format that supports accessibility.

The challenge is to find the FO element that corresponds to a piece of text or an image to be rendered. This is required to build the structure tree.

The current implementation uses 2 XSLT transforms as a preprocess in Fop.getDefaultHandler. The first addPtr.xsl adds a pointer attribute with a unique value to each FO that appears in the structure tree. The second transform reduceFOTree.xsl removes all elements, attributes and text from the input FO, that are not required in the structure tree. The result from this second transform can be seen in the intermediate XML, where it is split per page-sequence. See the element structure-tree below for the IF (structureTree for Area Tree XML):

<?xml version="1.0" encoding="UTF-8"?>
<document xmlns="" xmlns:xlink="" xmlns:nav="">
        <x:xmpmeta xmlns:x="adobe:ns:meta/">
            <rdf:RDF xmlns:rdf="">
                <rdf:Description xmlns:xmp="" rdf:about=">
                    <xmp:CreatorTool>Apache FOP Version SVN branches/Temp_AreaTreeNewDesign</xmp:CreatorTool>
            <fo:flow xmlns:fo="">
                <fo:block xmlns:foi="" foi:ptr="N10014"/>
        <page index="0" name="1" page-master-name="1" width="594720" height="792000">
                <viewport width="594720" height="792000">
                    <font family="sans-serif" style="normal" weight="400" variant="normal" size="12000" color="#000000"/>
                    <text x="0" y="10266" ptr="N10014">hello</text>

The pointer information is passed to the PDF text or image drawing methods. It is used to associate the PDF stream that will be produced out of the text or image to its parent structure element. The intermediate XML formats carry over that information (see elements text and image above).


PDF_Accessibility (last edited 2009-10-26 15:41:26 by uk)