This page contains notes on the initial PDF accessibility implementation in Apache FOP.
PDF accessibility / Tagged PDF links
Testing PDF accessibility (Tagged PDF's)
- Acrobat Professional - Accessibility Check creates a report indicating any deficiencies with a PDF document.
Screen Readers can read tagged PDF's, see Wikipedia
Common requirement for tagged PDF's and PDF Accessibility
(R1) The documents logical structure has to be included into the PDF file (see PDF Reference 1.4 section 9.7).
Additional requirements for accessible PDF's
Screen readers require additional information in tagged PDF files in order to read documents aloud. To enable proper vocalization PDF supports the following features:
(R2) Providing textual descriptions for images (see PDF Reference 1.4 section 9.8.2, "Alternate Descriptions")
(R3) Specifying the natural language used for text in a PDF document - for example, as English or German (see PDF Reference 1.4 section 9.8.1, "Natural Language Specification"). An accessible PDF document should include the document's default language which applies to all text in a PDF document. The language can be set on descendant elements by overriding the document's language, but FOP does not currently carry over that information to the PDF output.
The initial implementation is for PDF output only, as this is the only currently implemented format that supports accessibility.
The challenge is to find the FO element that corresponds to a piece of text or an image to be rendered. This is required to build the structure tree.
The current implementation uses 2 XSLT transforms as a preprocess in Fop.getDefaultHandler. The first addPtr.xsl adds a pointer attribute with a unique value to each FO that appears in the structure tree. The second transform reduceFOTree.xsl removes all elements, attributes and text from the input FO, that are not required in the structure tree. The result from this second transform can be seen in the intermediate XML, where it is split per page-sequence. See the element structure-tree below for the IF (structureTree for Area Tree XML):
<?xml version="1.0" encoding="UTF-8"?> <document xmlns="http://xmlgraphics.apache.org/fop/intermediate" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:nav="http://xmlgraphics.apache.org/fop/intermediate/document-navigation"> <header> <x:xmpmeta xmlns:x="adobe:ns:meta/"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about="> <xmp:CreateDate>2009-01-20T22:27:23-08:00</xmp:CreateDate> <xmp:CreatorTool>Apache FOP Version SVN branches/Temp_AreaTreeNewDesign</xmp:CreatorTool> <xmp:MetadataDate>2009-01-20T22:27:23-08:00</xmp:MetadataDate> </rdf:Description> </rdf:RDF> </x:xmpmeta> </header> <page-sequence> <structure-tree> <fo:flow xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:block xmlns:foi="http://xmlgraphics.apache.org/fop/internal" foi:ptr="N10014"/> </fo:flow> </structure-tree> <page index="0" name="1" page-master-name="1" width="594720" height="792000"> <page-header/> <content> <viewport width="594720" height="792000"> <font family="sans-serif" style="normal" weight="400" variant="normal" size="12000" color="#000000"/> <text x="0" y="10266" ptr="N10014">hello</text> </viewport> </content> <page-trailer/> </page> </page-sequence> <trailer/> </document>
The pointer information is passed to the PDF text or image drawing methods. It is used to associate the PDF stream that will be produced out of the text or image to its parent structure element. The intermediate XML formats carry over that information (see elements text and image above).
(R3) Decide how the language should be defined. Other implementations specify the @xml:lang on fo:root level. The same attribute is set for descendant's to override the default language. There is also the common FO property country and language to consider.
[JM] XSL defines xml:lang as a shorthand for country/language/script. So both are equivalent from a user's perspective. It should be verified that xml:lang is properly mapped to the other three properties. Internally, the code should work off the basic XSL properties, not the shorthand.
- Implement PDF/A-1a support when PDF Accessibility is available (JM).
(R1) Implement support for the "role" property which would allow to keep fo:blocks containing titles apart from fo:blocks containing normal text. May involve adding support for the "RoleMap" dictionary entry of PDF's "StructTreeRoot".