PDFA1ConformanceNotes

PDF/A-1 Conformance Notes

This document discusses topic around making Apache FOP conformant to PDF/A-1 (ISO 19005-1:2005(E)). PDF/A-1 is an ISO standard that defines additional requirements and restrictions on PDF documents to make them useful for long-term preservation.

References:

Conformance Levels

PDF/A-1 defines two conformance levels: A and B. These are discussed separately below. The first goal is to make FOP level B conformant. Level A is a superset of level B and involves preserving the structural and semantic properties of the source document ("Tagged PDF").

Level B Conformance

Level B conformance basically has the primary purpose to define a file format based on PDF, known as PDF/A, which provides a mechanism for representing electronic documents in a manner that preserves their visual appearance over time, independent of the tools and systems used for creating, storing or rendering the files. This puts some constraints on the application generating the PDF files. Examples of such constraints are:

Implementation in Apache FOP

Outputting PDF/A-1 should be an optional feature as it may restrict the feature set of Apache FOP. For example, the use of EPS files directly embedded in PDF files may be desired by certain applications. As can be seen above, however, this feature is prohibited in PDF/A-1. The class PDFDocument should get a flag that turns on PDF/A-1 functionality. The PDF library as such should check conformance wherever possible, throwing an Exception if a breach of PDF/A-1 conformance is detected. But the PDF library cannot detect everything, for example, violations inside a page stream. Therefore, the PDFRenderer (and probably PDFGraphics2D, too) need to do similar checks if PDF/A-1 conformance is activated. Tasks identified for making FOP PDF/A-1b compatible so far are:

Level A Conformance

Level A adds requirements so the textual content and its structure can be recovered from an PDF file. This means supporting "Tagged PDF". Tasks identified in addition to the above for making FOP PDF/A-1b compatible so far are:

Problems

A major nuisance is that ISO 19005-1:2005(E) is a standard that is not freely available. You have to buy licenses from the International Organisation for Standardization (ISO). The price for a single-user license is 114 CHF (around 87 USD). This fact may make it difficult to maintain PDF/A-1 compatibility once it has been implemented, as not every committer and contributor may have access to a copy of the specification. You can find freely available copies of drafts of this standard on the net. Please note that there maybe differences to the actual and currently valid ISO document (ISO 19005-1:2005(E), corrected version, 2005-12-01).

Publicly available copies (found by using public search engines):

Additional Links

last edited 2006-02-22 11:07:36 by JeremiasMaerki