Task Force on Metadata

Summary Report

The CC:DA Task Force on Metadata has charges concerned with five issues. At its Midwinter
1999 meeting, the Task Force divided into four subcommittees to work on the first four charges.
The fifth charge (Recommending, as needed, rule revision to enable interoperability of
cataloging (with AACR2R) with metadata schemes) will be considered in light of the
conclusions found by considering the first four charges. The subcommittees met at the Midwinter 1999
meeting and continued its discussions by email. Each group leader submitted a report.
Following are summaries of each groups deliberations.

Charge #1: Analyzing the resource description needs of libraries.

The group decided at Midwinter that it would:

review existing principles of the purpose of the catalog

define library or catalog users and their expectations

look at the context in which we are describing our resources

Principles or purposes of the catalog, users, and the context in which catalogs are used would be
part of any analysis. The first step was to sketch out a general approach or understanding of
these three elements  the purpose of catalogs, the users of catalogs, and the context in which
catalogs are used.

From catalog principles to user tasks (consists of a summary from a preliminary review of
existing statements of principles for catalogs)

Based on a preliminary review of the literature, the group determined that one useful way to
analyze resource needs is by breaking the needs into five basic tasks. The five tasks are derived
from the Functional Requirements for Bibliographic Records (FRBR) four basic user tasks: find, identify, select, and obtain. The fifth task, taken from Rahmatollah Fattahis paper, AACR2 and catalogue production technology (Toronto conference paper, 1997), is what he calls housekeeping, meaning management or administration.
Thus, manage or administer is added to the tasks of find, identify, select, and obtain. The management task is simply a heading under which may be grouped those tasks which are necessary to an institution, its staff, or its business partners/vendors/etc. and which indirectly relate to fulfilling the four basic user tasks. The user needs are paramount; the manager needs are subordinate.

Clearly, within any operational context, so loosely defined a task as manage would need to be
further analyzed into precisely defined sub tasks. However, the same can be said of the
FRBRs four basic tasks without any diminishment in the usefulness of these concepts for analyzing the resource description needs of libraries or for analyzing the appropriateness of a metadata data element set, a library information management system, or a library taken as a whole, service-oriented enterprise. These high-level abstractions  find, identify, select,
obtain, and manage  provide a solid basis for further analysis. Together they provide the point we may stand on as a profession to have a point-of-view.

Our resource description needs are grounded in the needs of our users to find,
identify, select, and obtain some information thing (book, article, map, score, data set, etc.) We judge our tools  catalogs, indexes, search engines, etc.  primarily by how well they do these tasks. And not only must they perform the user tasks well, but they must make the management task or house keeping as simple, easy, flexible, and cheap as possible.

The myth of the library user, or, One size doesnt fit all (a reminder of the complexity of users both individually and as a group).

Based on arm-chair speculation, recollections from experience, and a preliminary
perusal of the literature, the group determined that it is not useful for this discussion to categorize
users into types. Such categories as expert, novice, scholar, student, researcher, citizen, child,
young adult, color blind, library staff, company employee, non-English speaking, only English
speaking, etc. may indeed be highly valuable in some particular instance, but the main
point to glean from this multiplicity is just that  users, even individual users, are (from the point
of view of our definition of users into categories of users) multiple, complex, and protean.

Thus, library resource descriptions, although grounded in meeting the four basic tasks of the
user, cannot be based on any firm categorization of users into static, defined types. (For a
similar view, see Carl Lagozes D-Lib article, From Static to Dynamic Surrogates: Resource
Discovery in the Digital Age (D-Lib, June 1997). Lagozes point is that the multiplicity of roles
that just one user can take on during the resource discovery process is one of the main
contributors to the complexity of the resource discovery process.) We may be convinced from
this that institutions or tools that base their resource description needs on simple, static notions of
user roles will be inadequate to the demands of real users. A catalog or metadata scheme
designed around the idea that the users only care about numeric data and not about images (or
vice versa) will fail the user who cares about both.

Libraries and catalogs in a new context, or, How do networks affect
libraries and their catalogs?

The fact that we need to provide a coherent environment for users and that the library catalog is
no longer at the center any more serves as a starting point for the groups discussions and
analysis of the context of the library catalog.

The group determined that when thinking of the resource description needs of libraries we
cannot think only of the catalog. Or, rather, that when we do think of the catalog, we cannot
usefully think of it as a s tand alone or isolated tool. When we do think of the catalog, we must
think of it as one of the important tools among a host of tools. Thus, it is a new breed of catalog
that we must imagine.

The network is making local or, more precisely, isolated catalogs extinct. We create our
institutional catalogs from shared resources and we share our catalogs via networked resources.
Our institutional catalogs are open to the world in ways we did not imagine they would be. A
user 10,000 miles away can use our catalog. That user may be searching a hundred other catalogs
at the same time. That user may want to integrate and format the search results in unpredictable
ways. An isolated or local only catalog cant contribute much to a coherent environment for
users.

All the thinking about catalogs that we did before the ascendency of the Internet (and the end of
the local-only or isolated catalog) must now be revised. Look at the statement in the Paris
principles: the catalogue should be an efficient instrument for ascertaining . . . It isnt enough
to be an efficient instrument for ascertaining anymore. In a networked environment, the tool
isnt done working until it has delivered the goods or come as close as possible. For some
electronic materials the catalog or ancillary tools will be able to deliver the items; for materials
that are not online, the catalog should at least present the user with the options for obtaining the
items  call an item from local stacks or storage, recall an item in circulation, fill out an ILL
request, connect to a document delivery service, etc.

Our catalogs have become one tool among many, but those many are not separate or isolated
from one another. The catalog is one tool in a network of tools. The basic or necessary principle
of tools in a network of tools is interoperability (The definition of which group 3 is charged
with). Or if interoperability is too high a demand, each tool in a network of tools must be
compatible with each other from the users perspective. For example, a user employing an
institutions catalog to find data sets relating to census and voting in Hartford, Connecticut may
well need to analyze the data once it has been located, and then format the analysis into a
presentation document dominated by images not numbers. While the catalog per se is only one
tool in this scenario, it should be compatible with a wide range of other tools that may be used as
functional extensions of the discovery and retrieval process. What is desirable is a network of
tools that are portable, flexible, agile, mappable, extensible, adaptable, a coherent network of
tools. The library catalog can be part of such a coherent environment, but only if it is designed,
maintained, and used as one tool in a network of tools.

The next step for this group may be to analyze these resource description needs in terms of future
needs.

Charge #2: Building a conceptual map of the resource description terrain/landscape and
developing models for accessing/using metadata both inside and outside the library community.

Working Group Number Two was charged with drawing up a conceptual map of the resource
description landscape. It submitted such a map. The map draws on the list discussion in the fall of 1998.

The map consists of five columns. For each resource category, the group has outlined what
modes of description and access existed before computers (if relevant), what exists currently, and
what it sees for its future. For some resource categories, there are additional comments. Some
categories are amalgams of like materials, for example, pamphlets, vertical files and offprints or
technical reports, dissertations, etc. (See Appendix 1.)

The group will continue to work on developing models for accessing/using metadata based on its
conceptual map.

Charge#3: Devising a definition of metadata and investigating the interoperability of newly
emerging metadata schemes with the cataloging rules and MARC format.

The subcommittee decided to employ a two-phase approach as its operating procedure. In phase
one, three of the subgroup members would collect and submit definitions of metadata,
interoperability and of newly emerging metadata schemes for open discussion and comment
on the CC:DAs electronic mail system, metamarda-L. In phase two, these definitions would be
filtered and evaluated against AACR2 and the MARC format to devise a working definition(s).

The three team members supplied a number of definitions culled from various sources that were
disseminated onto the electronic mail server for consideration and open discussion by members
of the subcommittee and other members of the entire task force. Beginning with the term
metadata followed by interoperability and metadata schema, each term received potential
definitions for consideration and discussion over a period of several weeks followed by a period
of reevaluation and refinement followed by reconsideration and discussion before proceding to
the next term. Overall, this sequential process of volunteered contributions for each term
resulted in 27 potential definitions for metadata, nine for interoperability, and ten for metadata
schema.

This method of operation proved to be a contextual and an evolving process, i.e. definitions
submitted first were reconsidered and reworked in light of subsequent evaluations, discussion,
and new submissions. The course of the discussion revealed that the varying definitions for each
concept appeared to depend on intent and context informing and surrounding the definition.
These deliberations also revealed that concepts found within the definitions of the terms being
considered were not as transparent and understandable as originally assumed. After soliciting
further input and comment on these developments the consensus seemed to be that the terms
under consideration by the task force remain somewhat nebulous concepts that would benefit
from further refinement. One possible alternative to an denotative definition satisfactory to all
parties and considerations may be to offer a definition followed by examples that provide further
elaboration and clarification. This approach along with the definitions submitted and considered
for all three terms are included in Appendix 2. However, for purposes of this report, the majority
of participants felt that we should offer a best attempt at working definitions for all three
concepts.

The formal working definitions for the three terms deliberated and submitted by the task force
subcommittee for charge #3 follow below (submitted May 17, 1999):

METADATA are structured, encoded data that describe characteristics of information-bearing
entities to aid in the identification, discovery, assessment, and management of the described
entities

INTEROPERABILITY is the ability of two or more systems or components to exchange
information and use the exchanged information without special effort on either system.

A METADATA SCHEME provides a formal structure designed to identify the knowledge
structure of a given discipline and to link that structure to the information of the discipline
through the creation of an information system that will assist the identification, discovery and use
of information within that discipline.

Using these working definitions, the group will continue to focus on interoperability of emerging
metadata schemes with cataloging rules and MARC.

Charge 4: Recommending ways in which libraries may best incorporate the use of metadata
schemes into current library methods.

At ALA Midwinter 1999 this group was asked how best to proceed in this direction. The group
came up with a definition of what a prototype library catalog would look like in the future
(given below):

have the patron use ONE search interface to access all information, whether it is a
number of different metadata types and standards, databases, and OPAC(s)

provide a seamless transition to the user to all information available, moving from the
ILS system of a front-end search mechanism that accesses numerous resources, to a
search interface that can access all information available in any standard, format, location,
or subject. (Example: interface can search local OPAC, World Wide Web, metadata
standards (EAD, TEI, GILS, Dublin Core), special collections, museum holdings, etc.,
and present results in a useable format to the patron through one search mechanism)

Our definition of prototype is: a virtually seamless access to information and relevant retrieval
of information from the users point of view.

The following prototype library catalogs were mentioned by members of the Task Force. This
is merely a list that members have said MAY be prototypes; they have not been examined or
explored by members of this group for comparison with the definition of prototype. As a next
step, the group will analyze these prototypes further.

The groups will meet at the Annual 1999 meeting in New Orleans and determine how to further
proceed. The agenda for the meeting is as follows:

8:30-10:15

Call to order

Presentations on 4 charges by leader(s)/other group members
of midwinter breakout groups

Proposed preconference on metadata for Annual
2000: ideas as to what The Ideal Preconference
on Metadata would have as subject matter, speakers, etc.

Plan for coming year (TF goes out of
existence annual of 2000)

10:15-10:45 break

10:45-12:30

RDF and XML / Eric Miller; Diane Hillman

Presentations

Questions and answers

Those interested in attending the RDF and XML session should RSVP to Mary Larsgaard
since seating may be limited. Those who responded will get first choice on seating.

The Task Force will continue its deliberations with the goal of analyzing charge #5 and submitting rule revision recommendations as needed. Work should be completed by Annual 2000.

The Task Force is also involved in presenting a preconference on metadata at the ALA Annual
meetings in Chicago in the year 2000. Daniel Kinney is putting together a committee of the CC:DA Task Force
on the metadata preconference.

APPENDIX 1

APPENDIX 2

Tool to accomplish various processes  Clifford Lynch from Keynote speech, Managing
Metadata for the Digital Library: Crosswalks or Chaos?

A cloud of collateral information around a data object  Clifford Lynch (ibid.)

Descriptive data about a resource that relieves the user of having to have full access to the
resource in order to know of its existence  Don Waters from I Know its out There but Where? Metadata conference

Data that documents or tracks the change or uses of data  John Perkins from his presentation
on the Coalition for the Interchange of Museum Information, Metadata conference

All of it is just data  no Meta-metadata  all data requires mechanisms for discovery,
management and access control  Clifford Lynch again, Metadata conference

Metadata describe the content, quality, condition, and other characteristics of data. Metadata
help a person to locate and understand data  FGDC content standard

Meta is a prefix that in most information technology usages means an underlying definition
or description. Thus, metadata is a definition or description of data and metalanguage is a
definition or description of language. Meta (pronounced MEH-tah in the U.S. and MEE-tah in
the U.K.) derives from Greek, meaning among, with, after, change. Whereas in some English
words the prefix indicates change (for example, metamorphosis), in others, including those
related to data and information, the prefix carries the meaning of more comprehensive or
fundamental.  whatis.com

Data about data. Metadata describes how and when and by whom a particular set of data was
collected, and how the data is formatted. Metadata is essential for understanding information
stored in data warehouses  Webopedia.

Metadata is data about data (for example, a library catalog is metadata, since it describes
publications) or specifically in the context of this specification data describing Web resources.
The distinction between data and metadata is not an absolute one; it is a distinction created
primarily by a particular application, and many times the same resource will be interpreted in
both ways simultaneously.  Resource Description Framework Model and Syntax Specification.

Metadata is an abstraction from data. It is high-level data that describes lower-level
data  Briefing paper: What is metadata computerwire.com.

Structured description about an object or collection of objects; with
descriptive, administrative, and structural applications  Roy Tenant, during the Internet Librarian 98 Conference in Monterey (Supplied by George J. Janczyn)

The term metadata commonly refers to any data that aids in the
identification, description and location of networked electronic resources  p.1 of Hudgins, Jean, Grace Agnew, and Elizabeth Brown. 1999. Getting mileage out of metadata: Applications for
the library. Chicago: American Library Association.) (Supplied by Erik Jul)

In data processing, meta-data is definitional data that provides
information about or documentation of other data managed within an
application or environment. For example, meta data would document data about data elements or
attributes, (name, size, data type, etc) and data about
records or data structures (length, fields, columns, etc) and data about
data (where it is located, how it is associated, ownership, etc.).  Free
On-line Dictionary of Computing.

Information about data, or more specifically, the descriptive information rovided in meta tags
in an HTML or XML document header about that document.  Glossary of Internet Terms.

The descriptive information meta-data supplies allows the user to locate, evaluate, access,
and manage online available learning resources. Describing learning resources (whether
materials, activities, people or enterprises) digitally is similar in the physical world to attaching a label to an object, such as a can peas or a package of light bulbs. The label provides information about the contents of its container without having to actually open the container. . . . Examples of meta-data for learning materials are the title, the author, the targeted learning level, and the educational objectives of the material. . . . Meta-data is distinct from, but intimately related to, its contents. . . . Overall, meta-data serves as a complement to its contents and reflects the contents attributes to interested users.  Instructional Management Systems Project: What is Meta-data? The Meta Data Coalition (formerly Metadata Coalition) regroups vendors and users allied with a common purpose of driving forward the definition, implementation and ongoing evolution of a meta data interchange format and its support mechanisms. The need for such standards arises as meta data, or the information about the enterprise data emerges as a critical element in effective data management.  The Meta Data Coalition.(Supplied by Vianne Sha)

Metadata is data about data. Cataloging in a library setting is an example of metadata. But in
the Internet environment that involves commerce and services as well as objects, it has more
functions than description and resource discovery:

intellectual property rights including the contractual terms related to the documents use and
distribution

electronic commerce to encode prices, terms of payment, etc.

content rating to disclose the nature of a particular pages
contents which can be used in filtering content so that, for example,
parents can block inappropriate material from children

digital signatures: can you trust this document?

privacy issues: what information do browsers collect when you visit?
what information are users willing to disclose about themselves when
visiting a Web site?

Metadata is a means of assigning descriptive tags to government
information so that it is easily searchable and retrievable by clients
from any electronic or online tool, be it an Internet connection
through their home or business PC, via a public kiosk or by using a call
centre as the middle access environment (the Government Information
Centre is a good example of this environment). In non technical terms
metadata then is information which describes information.

Some characteristics of metadata:

It is readable by both humans and machines.

Metadata takes a variety of forms, both specialized (VRA) and
general (DC), and may be part of a larger framework (TEI).

New metadata sets will develop as the networked information
infrastructure matures.

Different communities will propose, design, and maintain different
types of metadata  Kathleen Forsythe and Diana Brooking, two librarians from the University of
Washington Libraries, gave an excellent presentation at
the Online Northwest Annual Conference (Supplied by Mark Watson).

Metadata. An encoded description of an information package (e.g., an AACR2 record
encoded with MARC, a Dublin Core record, a GILS record, etc.); the purpose of metadata is to
provide an intermediate level at which choices can be made as to which information packages
one wishes to view or search, without having to search massive amounts of irrelevant full
text.  Arlene Taylor

Metadata is data associated with objects which relieve their potential
users of having to have full advance knowledge of their existence or
characteristics. A user might be a program or a person, and metadata may support a variety of
uses or operations  Rebecca Guenther

Metadata is a structured description of the content, quality, condition, usage, and other
characteristics of data. They enable users to discover, locate, understand, and evaluate data.
They also enable administrators to manage data and control access to them.

Metadata is data about data, structured to meet the needs of information olders, managers,
and users. It helps users to discover, locate, understand, and evaluate data, and helps data
administrators to manage data and control access and use. For example, metadata may describe
how, when, why, and by whom a particular data object or set of data was collected or created,
what its content is, how it is formatted, and the conditions for its use  Clare Imholtz

Metadata is a structured, encoded description of an information package. Metadata provides
an intermediate level at which viewing or searching choices can be made in light of data
characteristics like content, quality, condition and usage. Metadata enables users to find,
identify, select and obtain information packages. Metadata also enables administrators to
manage information packages and control access to them.  Mark Watson

METADATA is a structured, encoded description
of an information package which serves an intermediary role
between the user and the information package by describing
data characteristics and allowing for viewing or searching
choices to be made. Metadata enables users to find, identify,
select and obtain information packages and also enables
administrators to manage information packages and control
access to them

Metadata pl. n. (used with a sing. or pl. verb) 1. Surrogate information about or related to a
resource, 2. A structured, encoded surrogate
Definition of surrogate: The OED offers: A. sb. 1. A person appointed by authority to act in
place of another; a deputy. and 2. a. fig. and gen. A person or (usually) a thing that acts for or
takes the place of another; a substitute. Const. for, of. b. spec. = substitute sb. 6 b. and following
up on the above suggestion of substitute, we find my personal favorite. 6. In technical use. c.
Mech. A short section used when a full-length section is not usable.

Interoperability:

Interoperability is the ability of a system or a product to work with other systems or products
without special effort on the part of the customer. Products achieve interoperability with other
products using either or both of two approaches: By adhering to published interface standards, or
By making use of a broker of services that can convert one products interface into another
products interface on the fly.  adapted from Whatis.com

A good example of the first approach is the set of standards that have been developed for the
World Wide Web. These standards include TCP/IP, HTTP, and HTML. The second kind of
interoperability approach is exemplified by the Common Object Request Broker Architecture
(CORBA) and its Object Request Broker (ORB).

Compatibility is a related term. A product is compatible with a standard but interoperable with
other products that meet the same standard (or achieve interoperability through a broker).

Interoperability is the ability of two or more systems or components to exchange information
and to use the information that has been exchanged  [IEEE 90].

Interoperability is: a. The ability of systems, units, or forces to provide services to and accept
services from other systems, unitsor forces and to use the services so exchanged to enable them
to operate effectively together. [JP1] b. The condition achieved among
communications-electronics systems or items of communications-electronics equipment when
information or services can be exchanged directly and satisfactorily between them and/or their
users. The degree of interoperability should be defined when referring to specific cases  Federal Standard 1037C-Glossary of telecommunication terms.

Interoperability means the easy integration of products from multiple vendors without the
need for custom hardware or software  Lonmark Association

Interoperability is the ability to use documents created with one DTD for a particular purpose
within another environment. For example, any DTD that uses TEI Extended Pointers for linking
creates documents which are interoperable, as regards linking, with TEI-encoded documents.
This interoperability applies even if the DTD uses no tags in common with TEI at all, since the
extended pointer mechanism is controlled by a particular use of attributes, not by specific
element types. Panorama Pro actually supports this particular type of interoperability, by offering
built-in support for TEI Extended Pointers  CIMI briefing paper on DTD interoperability.

Interoperability is defined as the ability of two or more systems or components to exchange
and use information and the ability of systems to provide and receive services from other systems
and to use the services so interchanged to enable them to operate effectively together  Police
Information Technology Organisation.

INTEROPERABILITY is the ability of two or more systems or components to exchange
information and use the exchanged information without special effort on either system

INTEROPERABILITY is the ability of two or more systems or components to exchange and
to use information, the ability to provide and receive services from other systems, and the ability
to use the services so interchanged to enable them to operate effectively together.

Metadata Schema:

Australian Geodynamics Cooperative Research Centre (AGCRC) Metadata
Schema
Local schema based on existing system:

Schemas: The diversity of metadata needs on the Web requires an
infrastructure that supports the coexistence of complementary,
independently maintained metadata packages. The World Wide Web Consortium (W3C) has
begun implementing an architecture for metadata for the Web. The Resource Description
Framework, or RDF, is designed to support the many different metadata needs of vendors and
information providers. The Dublin Core Metadata Initiative expects to support the infrastructure
for registries provided by RDF Schemas.

Interface Data Repository (IDR)

The IDR metadata is organized as a standard relational schema. At present
we have identified six tables that need to be maintained: Bundles,
Stores, Blocks, Block-Equivalency, Movement, and Movement-Associations.
This section describes the content and maintenance of these tables in an
informal form. For those familiar with RDBMS notation, the Appendix
contains a formal relational schema.

Dublin Core Workshop Series

The development of formal ontologies is currently a prominent line of
research in digital library communities, aimed at identifying the
structure of knowledge in a given discipline, and linking these
structures into a larger whole. In contrast, one might think of this
workshop series as an attempt to identify an emergent ontology, that
is, a consensus among experienced practitioners across many disciplines
about the basic elements of resource discovery.

(A Metadata schema is) information that assists the identification,
discovery and transaction processes. (It) provide(s) clients with the
opportunity to develop the knowledge that information exists (visibility
or identification); assist clients to access or discover the required
information; and assist client in the conduct of their business
transactions with government using interoperable business systems.

NABIR

(A metadata schema contains) guidelines to be used by Natural and
Accelerated Bioremediation Research (NABIR) program investigators in
managing their information and data. . . . Includes what these guidelines
will and will not do. (E.g.) They will not specify how investigators will
handle their data and analysis within their research projects. They will
not specify how investigators will exchange data among co-workers
(although voluntary agreement to some standards in this area will be
strongly encouraged). The guidelines speak primarily to the format and
documentation of data and information developed by the investigators that
are to be transferred to the NABIR program, either for communicating
research results or for basic long-term archiving and distribution of
site characterization data to the larger user community.

A metadata schema provides an ontology aimed at identifying the
structure of knowledge in a given discipline and linking these structures into a larger whole
through the creation of a system of information that assists the identification, discovery and
transaction processes of the given discipline structures.

Metadata Schema  A formal specification of the semantics and structure of a coherent
collection of attributes that can be assigned in the description of a esource, as well as constraints
that may apply to such descriptions.
 Stu Weibel notes:

Formal in this definition means that it is maintained by an authoritative
agency. It implies as well that the specification itself has a well-defined
structure of fields, field labels, and permissable data types.

Semantics refers to the human-understandable meaning of the attributes.

Structure refers to the encoding characteristics of the value (is it a
discreet value? a range? a ordered compound value . . . )

Coherent is used in the sense of an orderly relation of elements that are
constituents of a logical whole

Attributes are fields or elements, which may in turn have substructure.

Constraints may include number and order of possible values, optionality,
permissable data types, specification of collating sequence, etc.

Aimed at identifying the structure of knowledge in a given
discipline and linking these structures into a larger whole, a metadata scheme assists users in the
identification, discovery and use of information. A metadata scheme accomplishes this objective
by providing a formal specification of the semantics and structure of a coherent collection of
attributes that can be assigned in the description of information within a respective discipline.

Aimed at identifying the structure of knowledge in a given discipline and linking these
structures into a larger whole, a metadata scheme provides a formal specification of the
semantics and structure of a coherent collection of attributes that can be assigned in the
description of information within a respective discipline.  Mark Watson