Identification and description

PDF/A is a family of ISO standards for constrained forms of Adobe PDF intended to be suitable for long-term preservation of page-oriented documents for which PDF is already being used in practice. The PDF/A standards are developed and maintained by a working group with representatives from government, industry, and academia and active support from Adobe Systems Incorporated. The working group is WG 5 of Technical Committee ISO/TC 171, Document management applications, Subcommittee SC 2, Application issues [ISO TC171/SC2/WG5]. This group works in cooperation with: ISO/TC130, Graphics technology; ISO/TC42, Photography; and ISO/TC46/SC11, Information and documentation, Archives/records.
management.

PDF/A-1, the first PDF/A standard [ISO 19005-1:2005], was based on PDF version 1.4 and published in 2005. PDF/A-2 as defined in ISO 19005-2:2011, extends the capabilities of PDF/A-1 and is based on PDF version 1.7 (as defined in ISO 32000-1). PDF/A-3 adds a single and highly significant feature to its predecessor PDF/A-2, to permit the embedding within a PDF/A file a file, or files, in any other format, not just other PDF/A files (as permitted in PDF/A-2). The intent expressed by many proponents is that the embedded files not be considered part of the archival payload. However, use cases are emerging where the embedded files would likely warrant preservation by archival institutions.

PDF/A attempts to maximize:

Device independence

Self-containment

Self-documentation

The constraints include:

Audio and video content are forbidden

Javascript and executable file launches are prohibited

All fonts must be embedded and also must be legally embeddable for unlimited, universal rendering

Colorspaces specified in a device-independent manner

Encryption is disallowed

Use of standards-based metadata is mandated

The PDF/A standards define levels of conformance: conformance level A satisfies all requirements in the specification; level B and level U are lower levels of conformance, still satisying the requirements of ISO 19005 regarding the visual appearance of electronic documents, but less demanding as to representation of structural or semantic properties.

Production phase

A final-state format for delivery to end users and long-term preservation of the document as disseminated to users.

Local use

LC experience or existing holdings

LC was represented on the working group for the original PDF/A standard and continues to be active in the development of new versions.

LC preference

The Library of Congress expresses preferences for formats for content (primarily in physical form) for its collections through the "Best Edition" specification from the U.S. Copyright Office in Circular 7b. Rev: 08 ⁄ 2010 of Circular 7b lists formats acceptable for mandatory deposit of Electronic Serials available only online, in order of preference. For page-oriented renditions, PDF/A appears first on the list. Other forms of PDF are acceptable, preferably with searchable text. [Note: When this Circular was published, PDF/A-2 and PDF/A-3 did not exist.]

In general, PDF/A-1 and PDF/A-2 are preferred formats for page-oriented textual (or primarily textual) documents when layout and visual characteristics are more significant than logical structure. Note that, for PDFs based on page images digitized by scanning, the source images are considered the master format if available. PDFs created from those images may be optimized for access convenience rather than sustainability.

Sustainability factors

Disclosure

A family of open standards. Developed by a working group (WG 5) under ISO/TC171/SC2, the subcommittee for Document Management Applications, Application Issues, for which AIIM (The Association for Information and Image Management) acts as secretariat. WG5 is a Joint Working Group, which also includes ISO/TC 46 SC11, Archives/records Management, ISO/TC 130, Graphics Technology, and ISO/TC 42, Photography.

Since the initial PDF/A standard was published in late 2005, tools for creation, conversion, and validation have been reaching the market steadily. Adobe's own Acrobat Professional 7.0 allowed saving files in a form compliant with the draft standard. Acrobat 8 and later versions support the standard as published. Microsoft Office 2007 supported creation of PDF/A files through Save as PDF, an add-on module. Open Office introduced support for PDF/A in release 2.4 (in early 2008).

Several commercial companies with products aimed at large enterprises, have produced products supporting the creation, migration, and validation of PDF/A files: Apago, Inc., Visioneer (for scanning paper to PDF/A), Callas Software, Compart Systemhaus, Luratech, Nuance, PDF Tools AG. Many of these companies are based in Europe, where the growing requirements from the EU for use of digital formats that are formal (preferably ISO) standards has produced more market pressure than in the U.S. Starting with version 0.93 (released in January 2007), the widely used open source FOP (Formatting Object Processor, based on the W3C's XSL-FO standard) from Apache, has support for the minimal PDF/A profile, PDF/A-1b. A list of supporting products, compiled by AIIM, based on information supplied by vendors, can be found at http://www.aiim.org/Research-and-Publications/Standards/Articles/PDFA-Compliant-Products

The standards development process involved active participation on behalf of communities whose endorsement or adoption would create significant momentum for wider adoption in the sense of requirement or preference for PDF/A over generic PDF for archival deposit or submission. Important groups are government agencies and legislative and judicial institutions. Adobe reported migration of legacy "report silos" at several (un-named) financial institutions at a meeting of the European DLM (Document Lifecycle Management) Forum in Helsinki in November 2006. An increasing number of libraries and other archival institutions are recommending or requiring PDF/A. For pragmatic reasons, when PDF/A is mandated, PDF/A-1b is usually acceptable. Full PDF/A-1a compliance, with tagged document structure, is hard to achieve except in a workflow that anticipates that objective from initial document creation. A few examples of libraries and archives recommending or mandating PDF/A are: Virginia Tech for electronic theses; National Archives of Norway; University of Texas Libraries (for textual documents deposited in a digital repository).

Within the U.S. Government, there is an increasing level of encouragement for the use of PDF/A. The U.S. National Archives and Record Administration has participated actively in the development of PDF/A and provides guidance on its use for transfer of records by government agencies. The United States Patent and Trademarks Office (USPTO) has requirements for PDFs that it accepts for electronic filing; the requirements are based on the PDF/A specification. Documents conforming to PDF/A-1 meet the USPTO requirements. According to an announcement available on the PACER (Public Access to Court Electronic Records -- for U.S. Federal Courts) web site in February 2011, "The Judiciary is planning to change the technical standard for filing documents in the Case Management and Electronic Case Filing (CM/ECF) system from PDF to PDF/A." In July 2012, a press release by Adlib indicated that the U.S. Department of State had replaced its cable system based on ASCII text with one based on PDF/A.

A list of entities recommending or requiring use of PDF/A was found at http://www.adobe.com/enterprise/standards/pdfa/ from Adobe between 2010 and early 2013 (link now via Internet Archive at http://web.archive.org/web/20130502134821/http://www.adobe.com/enterprise/standards/pdfa/).

Licensing and patents

Adobe has a number of patents covering technology that is disclosed in the Portable Document Format (PDF) Specification, version 1.3 and later, and hence in the ISO 19005 specifications by reference. As an ISO standard, the compliance of approved parts of ISO 19005 with the ISO/IEC/ITU common patent policy has been vetted.

In association with the adoption of PDF, version 1.7 as an ISO standard (ISO 32000-1:2008), Adobe issued a Public Patent License, granting "every individual and organization in the world the royalty-free right, under all Essential
Claims that Adobe owns, to make, have made, use, sell, import and distribute Compliant
Implementations."

Transparency

Depends upon compliant software tools to read. Building tools requires sophistication. PDF/A does not permit encryption.

Self-documentation

Support for embedding any form of metadata for a document is extremely good. Use of XMP is mandatory for basic descriptive and identifying metadata. Other XMP metadata packages can be embedded.

External dependencies

PDF/A is constrained to avoid external dependencies. All necessary fonts must be embedded.

Technical protection considerations

PDF/A does not permit encryption.

Quality and functionality factors

Text

Normal rendering

Good support is possible, but not guaranteed. The PDF/A format does not preclude creating documents from scanned page images; such files do not necessarily support indexing of the document text or extraction of text for quotation. See Notes below for more on creating PDF/A documents by scanning.

Integrity of document structure

The logical structure of a document is only represented in a PDF/A file if the creator or process during creation takes steps to incorporate structural tagging. The PDF/A standard recommends the representation of structural hierarchy

Integrity of layout and display

PDF is designed to represent the layout of page-oriented documents.

Support for mathematics, formulae, etc.

Can be represented by embedded graphics.

Functionality beyond normal rendering

Annotations may be embedded. Bookmarks may be provided.

File type signifiers

Tag

Value

Note

Filename extension

pdf

The standard does not indicate that a different extension should be used to distinguish PDF from PDF/A.

The standard specifies that the PDF/A version and conformance level of a file shall be specified using the PDF/A Identification extension schema defined in the standard. This schema has two mandatory elements: pdfaid:part (integer) and pdfaid:conformance (closed list of text values). For example a PDF/A-1b file should have the integer value 1 for pdfaid:part and the value "B" for pdfaid:conformance. See Notes below for example of tagging in this schema.

File signature

See note.

PRONOM entry for fmt/95 provides signatures used by the DROID software to identify PDF/A files. This identification is based on the namespace declaration in the PDF/A Identification. See Notes below for an example of PDF/A Identification, including the namespace declaration, beginning xmlns.

Notes

General

Each PDF/A standard is aligned to the fullest extent possible with the then current PDF/X standard.

A sample identification of part and conformance level
is found within a mandatory metadata chunk for PDF/A Identification:<rdf:Description rdf:about="" xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/">
<pdfaid:part>1</pdfaid:part>
<pdfaid:conformance>B</pdfaid:conformance>
</rdf:Description>

In its Frequently Asked Questions (FAQs) about Transferring Permanent Records in PDF/A-1 to NARA the U.S. National Archives provides guidance on image quality when creating PDF/A files by scanning page images. Such guidelines improve visual legibility. However, the effectiveness of optical character recognition depends heavily on the condition of the original document and the degree to which it employs small print, special fonts and complex layout. For documents that originate in electronic form and are primarily textual, it is almost always preferable to convert to PDF/A using a workflow that does not rely on printing and scanning (or an equivalent process using an intermediate raster image).

History

Developed to address the issue that large bodies of official documents and important information are maintained in PDF, but that PDF is not suitable as an archival format. The Administrative Office of the U.S. Courts was a driving force in forming a U.S. Committee to initiate an ISO standard based on PDF. The development of ISO-19005-1 was under the joint auspices of AIIM and NPES (National Printing Equipment Suppliers).

PDF Guidelines for EFS-Web (http://www.uspto.gov/ebc/portal/efs/pdf-guidelines.pdf). Guidelines for electronic filing of patent applications in PDF from the U.S. Patents and Trademarks Office.

Preserving documents into the future: PDF/A Update (http://www.aiimhost.com/DLM/DLMHelsinki_MarcStraat.pdf). Presentation at the European Document Lifecycle Management (DLM) Forum in November 2006, by Marc Straat, Head of Standards Development Europe, Adobe Systems Europe Limited (www.aiimhost.com/DLM/DLMHelsinki_MarcStraat.pdf)

PDF Reference Archives (http://www.adobe.com/devnet/pdf/pdf_reference_archive.html). Documentation for versions of PDF prior to the current version. Includes access to errata and supplements.

Governments adopting PDF/A and PDF for official use (http://web.archive.org/web/20130502134821/http://www.adobe.com/enterprise/standards/pdfa/). Starting in 2010, Adobe listed governments adopting PDF/A or PDF as recommended, mandatory or acceptable. Link to early 2013 version from Internet Archive.

The PDF/A Competence Center publishes technical notes and guidelines aimed at its members (mainly software developers, systems integrators, and service providers). These publications include:

Morrissey, Sheila M., "The Network is the Format: PDF and the Long-term Use of Digital Content", Archiving 2012, (2012): pp. 200-203. ISBN: 978-0-89208-300-8 (print). Available online at Portico with permission of IS&T: The Society for Imaging Science and Technology.