PDF/A Conformance Notes

This document discusses what needs to be done to make Apache FOP conformant to PDF/A (ISO 19005). PDF/A is an ISO standard that defines additional requirements and restrictions on PDF documents to make them useful for long-term preservation.

References:

Implementing Support for PDF/A-1

Conformance Levels

PDF/A-1 defines two conformance levels: A and B. These are discussed separately below. The first goal is to make FOP level B conformant. Level A is a superset of level B and involves preserving the structural and semantic properties of the source document ("Tagged PDF").

Level B Conformance

Level B conformance basically has the primary purpose to define a file format based on PDF, known as PDF/A, which provides a mechanism for representing electronic documents in a manner that preserves their visual appearance over time, independent of the tools and systems used for creating, storing or rendering the files. This puts some constraints on the application generating the PDF files. Examples of such constraints are:

Implementation in Apache FOP

Outputting PDF/A-1 should be an optional feature as it may restrict the feature set of Apache FOP. For example, the use of EPS files directly embedded in PDF files may be desired by certain applications. As can be seen above, however, this feature is prohibited in PDF/A-1. The class PDFDocument should get a flag that turns on PDF/A-1 functionality. The PDF library as such should check conformance wherever possible, throwing an Exception if a breach of PDF/A-1 conformance is detected. But the PDF library cannot detect everything, for example, violations inside a page stream. Therefore, the PDFRenderer (and probably PDFGraphics2D, too) need to do similar checks if PDF/A-1 conformance is activated. Tasks identified for making FOP PDF/A-1b compatible so far are:

Level A Conformance

Level A adds requirements so the textual content and its structure can be recovered from an PDF file. This means supporting "Tagged PDF". Tasks identified in addition to the above for making FOP PDF/A-1b compatible so far are:

Implementing Support for PDF/A-2

PDF/A-2 is an updated version of PDF/A based on PDF 1.7 (ISO 32000-1). It relieves some limitations imposed by PDF/A-1 and allows constructs that appeared in newer versions of PDF.

The main element of interest in the context of FOP is the possibility to use transparency. Although transparency was already available in PDF 1.4, PDF/A-1 was forbidding it because the model was not entirely well defined. Since this is now the case in PDF 1.7, transparency is allowed by PDF/A-2.

Because of the backwards-compatibility of PDF, any PDF/A-1 compliant file should normally also be PDF/A-2 compliant (at the same conformance level).

Conformance Levels

PDF/A-2 introduces a new conformance level, level U. This is basically the same as level B + the presence of ToUnicode maps. Therefore, level A is a superset of level U, which is a superset of level B.

Some confusion can occur when mixing PDF/A-1 and PDF/A-2:

Implementation

We can largely rely on the current implementation of PDF/A-1. We just need to add the constants for PDF/A-2, and relieve the constraint on transparency when targetting PDF/A-2.

We should leave the choice to the user to select conformance level B or U. From a FOP point of view those are equivalent since ToUnicode maps are always generated, yet it is better if the user can retrieve their selected conformance level in the XMP metadata.

Problems

A major nuisance is that ISO 19005-1:2005(E) is a standard that is not freely available. You have to buy licenses from the International Organisation for Standardization (ISO). The price for a single-user license is 114 CHF (around 87 USD). This fact may make it difficult to maintain PDF/A-1 compatibility once it has been implemented, as not every committer and contributor may have access to a copy of the specification. You can find freely available copies of drafts of this standard on the net. Please note that there maybe differences to the actual and currently valid ISO document (ISO 19005-1:2005(E), corrected version, 2005-12-01).

Publicly available copies (found by using public search engines):

PDFAConformanceNotes (last edited 2012-10-19 18:40:11 by VincentHennebert)