JHOVE: a widely-used open source digital preservation tool

JHOVE is a widely-used open source digital preservation tool, used for validating formats of digital objects. The Open Preservation Foundation has assumed responsibility for this project and is in the process of creating a new permanent and sustainable home for JHOVE.

“I don’t know of any open source validator that is as efficient as JHOVE, able to handle about 12 formats, written in JAVA and as famous as it. There are surely some others, but one which includes PDF for free, I don’t know of any ... At this level, it is undeniably the opportunity for the whole digital archiving community to join efforts in order to maintain and improve the situation of this international tool” (Open Preservation Foundation).

JHOVE(JSTOR / Harvard Object Validation Environment) digital preservation tool was originally developed in 2003 by Harvard and JSTOR for automating format-specific identification, validation, and characterization of digital resources. In particular, the JHOVE was conceived to be integrated into the Ingest function of an OAIS and was made available under an open source license (GNU Lesser General Public License) to support twelve file formats and to be widely deployed internationally.

The OPF is now stewarding the JHOVE software in line with its Software Maturity Model (that facilitates development and release of patches and new modules) and coordinating road-mapping and future development activities.

Validation: checking whether the DO conforms to its format’s technical norms;

Characterization: providing a report of the DO’s salient properties.

Identification and validation are linked; that means that any trivial error in the validation process can result in a DO failing to be identified. Format validation conformance is determined at three levels:

well-formedness: a DO is well-formed if it meets the purely syntactic requirements for its format:

validity: a DO is valid if it is well-formed and it meets the higher-level semantic requirements for format validity;

consistency: a DO is consistent if it is valid and its internally extracted representation information is consistent with externally supplied representation information.

JHOVE only reports full conformance to a profile, that is, it focuses on the semantics of a file rather than its content: a file which is well-formed but not valid has errors.

On May 2016, JHOVE 1.14 was released. This new version has three new format modules: gzip, WARC and PNG. Among other features, it has a black box testing module and support for Unicode 7.0.0.

JHOVE is designed incorporating an API, which can be used on its own to create compatible tools and applications. Developers wishing to recompile the JHOVE source code will require Apache Ant.

The JHOVE website provides user and developer documentation and is currently under review to ensure it is up to date and accurate. Installation of JHOVE requires solid knowledge of command line interfaces and experience with manually editing configuration files. Familiarity with metadata outputs is also essential.

The SourceForge code repository includes a forum and it also hosts a mailing list and the usual facilities for filing bug reports, feature requests and support requests.

On 11th of October, 2016, the OPF held JHOVE Online Hack Day to enhance digital preservation community knowledge about JHOVE errors, in particular, to create descriptions of errors and to identify example files, as well as to start to understand their preservation impact and what can possibly be done about them.

On the occasion of the JHOVE Online Hack Day, a collaborative Google documenthas been created to organize the tasks, contributed from the Document Interest Group and JHOVE Product Board which welcome additional suggestions from the community.