PDF/A Compliance

Overview

Inspired by the Isartor test set for validating PDF/A compliance we are working on a similar style set of negative tests for basic XMP compliance (PDF/A XMP TechNotes).

While it is clear that this work needs to be done, nobody has applied the required resources to the issue since the release of PDF/A 19005-1 in 2005. We're helping to fill the gap.

Approach

The PDF/A extension schemas were one of the few new features introduced with PDF/A. While these schemas are a great way to clearly specify schemas used by XMP in PDF/A files, introducing new features in a specification without at least one existing reference implementation has its pitfalls.

Three years later we're catching up and on trying to validate XMP we're discovering holes and errors in the PDF/A extension schemas. (example: TechNote 0009 is not clear about 'required' vs 'optional' properties. For pdfaSchema, it would be good if only one of pdfaSchema:property or pdfaSchema:valueType were required to contain members - this makes implementation of the 'Auxiliary Schemas' like 'Dimensions' value type possible).

This immaturity of the PDF/A extension schemas is what led us to implement all the pre-defined schemas of PDF/A (and their required auxiliary schemas) using the very same PDF/A extension schemas. By eating our own dog food in this way, we got much closer to the nuts and bolts issues surrounding the PDF/A extension schemas.

Deliverables

While each vendor will obviously implement their own XMP validator for PDF/A validation and conversion, there are some areas where we can easily collaborate. We believe that it is in all our interests to openly share an RDF and PDF/A compliant XMP implementation of the pre-defined schemas required to validate PDF/A files. This implementation is available to members and non-members of the PDF/D Consortium under the regular LGPL license (in a nutshell: all we require is attribution/credit)Feedback, corrections and questions welcome:

RDF Schemas for PDF/A: pdfa.rdf.zip (1.1 - last updated in September 2009)

In addition to the schema implementation, the consortium is working on a rich set of validation tests for XMP using the same testing methodology as the Isartor compliance tests. These tests are only available to PDF/D Consortium members.This screenshot shows a sampling of the tests illustrating the use of the PDF/A or TechNote clause and the test name used in naming each test case file:

pdfaValidate Schema

The XMP Specification makes provision for extending existing XMP Properties with Qualifier Properties that are ignored by applications that are not aware of them. We used this feature and the pdfaValidate schema to extend both pdfaProperty and pdfaField to add validation information. When defining the schemas we wish to validate, we can now add the following attributes:

status

Description: Used by validator to flag errors of omission, inclusion or raise warnings.Type: Closed Choice of TextValues: required|prohibited|deprecated|restricted|recommended|ignoredNote: 'deprecated' is similar to 'prohibited' only it is flagged as a warning and not an error by validators.

constraint

Description: Regular expression used to constrain "Closed Choice of " values. We still need a way to flag Open vs Closed.Regular expressions always need to match all input (start with '^' and end with '$'). Other valid constraint values include:'base64': used to validate Thumbnail xapGImg:image property for example.Numeric ranges depicted as: '[0,255]', '(0,)', '[-128,127]', etc.Type: TextComparison to other properties: '>=@OtherProperty', '==@OtherProperty'

predefined

default

Description: A default value for a required property. It shall be corresponding type of the property value.Type: TextValues: string with value of corresponding value type of property

subst

Description: Property name of a predefined schema property used to for substitution instead of this one, used with pdfaValidate:status = "prohibited | deprecated".Type: TextValues: string with qualified property name, e.g. "xmp:Identifier"

count

standard

Description: This value determines which specification is violated when constraints are not met.Type: Closed Choice of TextValues: pdf|pdfa|pdfd|xmp

clause

Description: This is the clause in the specification which is violated when constraints are not met.Type: TextValue: string, typically dot delimited integers

This schema is defined as a regular PDF/A extension schema and is included in the pre-defined schema download. The fields 'clause' and 'standard' will already be familiar with those of you who have been following our work on the Open Compliance Reporting format.