Friday, 3 April 2009

Draft data dictionary and schema for document significant properties

A data dictionary and related schema has been drafted for those documents that are largely text, but where creators can specify formatting, such as fonts, colours, text size and page layout; where they can embed images and other items; and where there might take advantage of application features, such as the ability to create annotations or page thumbnails. Specifically targetted formats are: OpenDocument Text, PDF, Staroffice, MS Works, MS Word and Wordperfect. Significant properties relating to appearance, behaviour, content and structure are recorded, and it's anticipated that this metadata could be plugged into PREMIS 2.0's objectCharacteristicsExtension.

The designers, from the California Digital Library and Harvard's University Library, are seeking comments from the digital preservation community. Semantic units are: PageCount, WordCount, CharacterCount, ParagraphCount, Line Count, TableCount, GraphicsCount, Language, Fonts, FontName, IsEmbedded, Features. You can see the current schema in full at http://www.fcla.edu/dls/md/docmd.xsd

This looks like a useful addition to preservation metadata, provided tool support for extracting the information and populating metadata records follows. I think the list of values for 'Features' - isTagged, hasLayers, hasTransparancy, hasOutline, hasThumbnails, hasAttachments, hasForms, hasAnnotations - may need extending (hasFootnotes, hasEndnotes?), and it would be good to see some definitions and examples of the existing values.

I wonder if we need a different data dictionary and schema for slideshows? This one might be adequate with some additions to cover things like animations, timings, etc. Seeing this data dictionary also reminds me that we need to look at where the Planets folk are up to on their significant properties work (XCDL/XCEL).

What's the futureArch blog?

A place for sharing items of interest to those curating hybrid archives & manuscripts.

Legacy computer bits wanted!

At Bodleian Electronic Archives and Manuscripts (BEAM) we are always on the lookout for older computers, disk drives, technical manuals and software that can help us recover digital archives. If you have any such stuff that you would be willing to donate, please contact susan.thomas@bodleian.ox.ac.uk. Examples of items in our wish list include: an Apple Mac Macintosh Classic II Computer, a Wang PC 200/300 series, as well as myriad legacy operating system and word-processing software.