The extraction of technical metadata and embedded 'descriptive' metadata from audio files of multiple, esoteric and proprietary formats.

Detailed description

Audio files intended for long-term preservation may be created outside of the control of standardized archival workflows.

To archive such files, lossless normalization to a standardized file type is ideal, alongside the accurate description of the original file's technical metadata for inclusion in the recording's catalogue entry. Descriptive metadata (in ID3v1/v2 tags or BEXT chunks) may also contain information useful for the enhancement of catalogue data.

Normalization to a single lossless audio format or format or different type risks the loss of such information. The extraction of this metadata is therefore vital prior to normalization.

While Jhove performs a similar function for WAVE and MP3 files, support for multiple file formats is limited.

Issue champion

Adam Tovell

Possible approaches

Software tools for describing the technical makeup of digital audio files and their embedded metadata exist, but are limited in functionality, format support and structuring around manual or GUI-based interfaces; requiring the use of multiple tools or parts of tools to achieve a simple, single goal. Manually exporting useful data from such tools is perfectly viable for single files of limited types, but proves inefficient when faced with large and technically-varied collections. Ideally, this would be solved by a single command line-driven tool for exporting metadata in a directly-usable or transformable format (xml, for instance), which could be incorporated into batch scripts to automate large-scale metadata extraction.