Archives Hub Blog

26 January 2010

A model to bring museums, libraries and archives together

I am attending a workshop on the Conceptual Reference Model created by the International Council of Museums Committee on Documentation (CIDOC) this week.

The CIDOC Conceptual Reference Model (CRM) was created as a means of enabling information interchange and integration in the museum community and beyond. It "provides definitions and a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation".

It became an ISO standard in 2006 and a Special Interest Group continues to work to develop it and keep it in line with progress in conceptualisation for information integration.

The vision is to facilitate the harmonization of information across the cultural heritage sector, encompassing museums, libraries and archives, helping to create a global resource. The CRM is effectively an ontology describing concepts and relationships relevant to this kind of information. It is not in any sense a content standard, rather it takes what is available and looks at the underlying logic, analysing the structure in order to progress semantic interoperability.

I come to this as someone with a keen interest in interoperability, and I think that the Archives community should engage more actively in cross-sectoral initiatives that benefit resource discovery. I am interested to find out more about the practical application and adoption of the CRM. My concern is that in the attempt to cover all eventualities, it seems like quite a complex model. It seeks to 'provide the level of detail and precision expected and required by museum professionals and researchers'. It covers detailed descriptions, contexts and relationships, which can often be very complex. The SIG is looking to harmonise the CRM with archival standards, which should take the cultural heritage sector a step further towards working together to share our resources.

I will be interested to learn more about the Model and I would like to consider how the CRM relates to what is going on in the wider environment, and particularly with reference to Linked Data and, more basically, the increasing recognition of web architecture as the core means to disseminate information. Initiatives to bring data together, to interconnect, should move us closer to integrated information systems, but we want to make sure that we have complimentary approaches.

You can read more about the Conceptual Reference Model on the CIDOC CRM website.

15 January 2010

English Language -- subjectless constructions

This is (probably) a final blog post referring to the recent survey by the UK Archives Discovery Network (UKAD) Working Group on Indexing and Name Authorities. Here we look in particular at subject indexing.

We received 82 responses to the question asking whether descriptions are indexed by subject. Most (42) do so, and follow recognised rules (UKAT, Unesco, LCSH, etc.). A significant proportion (29) index using in-house rules and some do not index by subject (18). Comments on this question indicated that in-house rules often supplement recognised standards, sometimes providing specialised terms where standards are too general (although I wonder whether these respondents have looked at Library of Congress headings, which are sometimes really quite satisfyingly specific, from the behaviour of the great blue heron to the history of music criticism in 20th century Bavaria).

Reasons given for subject indexing include:

it is good practice

it is essential for resource discovery

users find it easier than full-text searching

it gives people an indication of the subject strengths of collections

it imposes consistency

it is essential for browsing (for users who prefer to navigate in this way)

it brings together references to specific events

it brings out subjects not made explicit in keyword searching

it enables people to find out about things and about concepts

it may provide a means to find out about a collection where it is not yet fully described

it maximises the utility of the catalogues

it helps users identify the most relevant sources

it can indicate useful material that may not otherwise be found

it enables themes to be drawn out that may be missed by free-text searching

it can aid teachers

it helps with answering enquiries

it facilitate access across the library and archive

it meets the needs of academic researchers

The lack of staff resources was a significant reason given where subject searching was not undertaken. Several respondents did not consider it to be necessary. Reasons given for this were:

the scope of the archive is tightly defined so subject indexing is less important

the benefits are not clear

the lack of a thesaurus that is specific enough to meet needs

a management decision that it is 'faddy'

the collections are too extensive

the cataloguing backlog is the priority

Name indexing is considered more important than subject indexing only by a small margin, and some respondents did emphasise that they index by name but not by subject. Comments here included the observation that subject indexing is more problematic because it is more subjective, that subjects may more easily be pulled out via automated means and that it depends upon the particular archive (collection). As with name and place indexing, subject indexing happens at all levels of description, and not predominantly at collection-level. Comments suggest that subjects are only added at lower-levels if appropriate (and not appropriate to collection-level).

For subjects, the survey asked how many terms are on average applied to each record. According to the options we gave, the vast majority use between one and six. However, some respondents commented that it varies widely, and one said that they might use a few thousand for a directory, which seems a little generous (possibly there is a misunderstanding here?)

Sources used for subjects included the usual thesauri, with UKAT coming out strongest, followed by Unesco and Library of Congress. A few respondents also referred to the Getty Art and Architecture Thesaurus. However, as with other indexes, in-house lists and a combination approach also proved common. It was pointed out in one comment that in-house lists should not be seen as lesser sources; one respondent has sold their thesaurus to other local archives. There were two comments about UKAT not being maintained, and hopes that the UKAD Network might take this on. And, indeed, when asked about the choice of sources used for subject indexing, UKAT again came up as a good thesaurus in need of maintenance.

Reasons given for the diverse choice of sources used included:

being led by what is within the software used for cataloguing

the need to work cross-domain

the need to be interoperable

the need to apply very specific subject terms

the need to follow what the library does

the importance of an international perspective

the lack of forethought on how users might use indexes

the lack of a specialist thesaurus in the subject area the repository represents (e.g. religious orders)