Differences between revisions 4 and 5
Revision 4 as of 2009-09-20 23:52:09
Size: 6823
Editor: localhost
Comment: converted to 1.6 markup
Revision 5 as of 2009-10-26 15:41:26
Size: 5571
Comment: Updated to reflect current implementation, user guide moved to website
Deletions are marked like this. Additions are marked like this.
Line 27: Line 27:
'''(R3)''' Specifying the natural language used for text in a PDF document - for example, as English or German (see PDF Reference 1.4 section 9.8.1, "Natural Language Specification"). An accessible PDF document should include the document's default language which applies to all text in a PDF document. You can change a language on descendant elements by overriding the document's language.

The section [[#head-1aae9f58
ba7b221421d56567b1b6a50d9a75792c|Changes to your XSL-FO input files]] will illustrate where you provide this information in the input XSL-FO file.
'''(R3)''' Specifying the natural language used for text in a PDF document - for example, as English or German (see PDF Reference 1.4 section 9.8.1, "Natural Language Specification"). An accessible PDF document should include the document's default language which applies to all text in a PDF document. The language can be set on descendant elements by overriding the document's language, but FOP does not currently carry over that information to the PDF output.
Line 33: Line 31:
The initial implementation is for PDF output only (as this is the only currently implemented format that supports accessibility) and is based on the code for the new intermediate format. The initial implementation is for PDF output only, as this is the only currently implemented format that supports accessibility.
Line 35: Line 33:
The challenge is to find the corresponding FO of a text element or image that is sent to the PDFPainter.drawText respectively PDFPainter.drawImage from the LM's. This is required to build the structure tree. The challenge is to find the FO element that corresponds to a piece of text or an image to be rendered. This is required to build the structure tree.
Line 37: Line 35:
The current implementation uses 2 XSLT transforms as a preprocess in Fop.getDefaultHandler. The first {{{addPtr.xsl}}} adds a pointer attribute with an unique value to each FO that appears in the structure tree. The second transform {{{reduceFOTree.xsl}}} removes all elements, attributes and text from the input FO, that are not required in the structure tree. The result from this second transform can be seen in the new IF, where it is split per page-sequence. See the element {{{structure-tree}}} below: The current implementation uses 2 XSLT transforms as a preprocess in Fop.getDefaultHandler. The first {{{addPtr.xsl}}} adds a pointer attribute with a unique value to each FO that appears in the structure tree. The second transform {{{reduceFOTree.xsl}}} removes all elements, attributes and text from the input FO, that are not required in the structure tree. The result from this second transform can be seen in the intermediate XML, where it is split per page-sequence. See the element {{{structure-tree}}} below for the IF ({{{structureTree}}} for Area Tree XML):
Line 74: Line 72:
The pointer information is also passed to the PDFPainter.drawText and PDFPainter.drawImage method. You can find the pointer attributes in the above IF in the {{{page}}} element in the elements {{{text}}} and {{{image}}}.

== User Guide ==

=== Enabling PDF accessibility ===

There are 3 ways to enable PDF accessibility:

 * '''Command line''' The command line option -a turns on accessibility. {{{fop -a -fo testcases/tc1/tc1.fo -pdf testcases/tc1/tc1.pdf}}}
 * '''Embedding''' {{{userAgent.getRendererOptions().put("accessibility", Boolean.TRUE);}}}
 * '''Optional setting in fop.xconf file'''
{{{
<fop version="1.0">
    <accessibility>true</accessibility>
    ...
</fop>
}}}

Make sure to call the new PDF code when you embed FOP in your Java code: {{{MimeConstants.MIME_PDF + ";mode=painter"}}}
=== Changes to your XSL-FO input files ===

 * '''(R1)''' Table cells require a table row as the parent.
 * '''(R1)''' Ensure that the order of {{{fo:block-container}}} in a page corresponds to the reading order.
 * '''(R2)''' Alternate text for images: The attribute {{{fox:alt-text}}} has been added for {{{fo:external-graphic}}} and {{{fo:instream-foreign-object}}}.
 * '''(R3)''' Document's default language: ''The document's default language is currently hard coded to English.''

=== Note ===

Adjust the Java heap size in order to process larger files.
The pointer information is passed to the PDF text or image drawing methods. It is used to associate the PDF stream that will be produced out of the text or image to its parent structure element. The intermediate XML formats carry over that information (see elements {{{text}}} and {{{image}}} above).

This page contains notes on the initial PDF accessibility implementation in Apache FOP.

Introduction

Testing PDF accessibility (Tagged PDF's)

  • Acrobat Professional - Accessibility Check creates a report indicating any deficiencies with a PDF document.
  • Screen Readers can read tagged PDF's, see Wikipedia

Common requirement for tagged PDF's and PDF Accessibility

(R1) The documents logical structure has to be included into the PDF file (see PDF Reference 1.4 section 9.7).

Additional requirements for accessible PDF's

Screen readers require additional information in tagged PDF files in order to read documents aloud. To enable proper vocalization PDF supports the following features:

(R2) Providing textual descriptions for images (see PDF Reference 1.4 section 9.8.2, "Alternate Descriptions")

(R3) Specifying the natural language used for text in a PDF document - for example, as English or German (see PDF Reference 1.4 section 9.8.1, "Natural Language Specification"). An accessible PDF document should include the document's default language which applies to all text in a PDF document. The language can be set on descendant elements by overriding the document's language, but FOP does not currently carry over that information to the PDF output.

Implementation

The initial implementation is for PDF output only, as this is the only currently implemented format that supports accessibility.

The challenge is to find the FO element that corresponds to a piece of text or an image to be rendered. This is required to build the structure tree.

The current implementation uses 2 XSLT transforms as a preprocess in Fop.getDefaultHandler. The first addPtr.xsl adds a pointer attribute with a unique value to each FO that appears in the structure tree. The second transform reduceFOTree.xsl removes all elements, attributes and text from the input FO, that are not required in the structure tree. The result from this second transform can be seen in the intermediate XML, where it is split per page-sequence. See the element structure-tree below for the IF (structureTree for Area Tree XML):

<?xml version="1.0" encoding="UTF-8"?>
<document xmlns="http://xmlgraphics.apache.org/fop/intermediate" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:nav="http://xmlgraphics.apache.org/fop/intermediate/document-navigation">
    <header>
        <x:xmpmeta xmlns:x="adobe:ns:meta/">
            <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
                <rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about=">
                    <xmp:CreateDate>2009-01-20T22:27:23-08:00</xmp:CreateDate>
                    <xmp:CreatorTool>Apache FOP Version SVN branches/Temp_AreaTreeNewDesign</xmp:CreatorTool>
                    <xmp:MetadataDate>2009-01-20T22:27:23-08:00</xmp:MetadataDate>
                </rdf:Description>
            </rdf:RDF>
        </x:xmpmeta>
    </header>
    <page-sequence>
        <structure-tree>
            <fo:flow xmlns:fo="http://www.w3.org/1999/XSL/Format">
                <fo:block xmlns:foi="http://xmlgraphics.apache.org/fop/internal" foi:ptr="N10014"/>
            </fo:flow>
        </structure-tree>
        <page index="0" name="1" page-master-name="1" width="594720" height="792000">
            <page-header/>
            <content>
                <viewport width="594720" height="792000">
                    <font family="sans-serif" style="normal" weight="400" variant="normal" size="12000" color="#000000"/>
                    <text x="0" y="10266" ptr="N10014">hello</text>
                </viewport>
            </content>
            <page-trailer/>
        </page>
    </page-sequence>
    <trailer/>
</document>   

The pointer information is passed to the PDF text or image drawing methods. It is used to associate the PDF stream that will be produced out of the text or image to its parent structure element. The intermediate XML formats carry over that information (see elements text and image above).

TODO

  • (R3) Decide how the language should be defined. Other implementations specify the @xml:lang on fo:root level. The same attribute is set for descendant's to override the default language. There is also the common FO property country and language to consider.

    • [JM] XSL defines xml:lang as a shorthand for country/language/script. So both are equivalent from a user's perspective. It should be verified that xml:lang is properly mapped to the other three properties. Internally, the code should work off the basic XSL properties, not the shorthand.

  • Implement PDF/A-1a support when PDF Accessibility is available (JM).
  • (R1) Implement support for the "role" property which would allow to keep fo:blocks containing titles apart from fo:blocks containing normal text. May involve adding support for the "RoleMap" dictionary entry of PDF's "StructTreeRoot".

PDF_Accessibility (last edited 2009-10-26 15:41:26 by VincentHennebert)