Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

3.
Definitions (1) <ul><li>Metadata: </li></ul><ul><ul><li>A relatively new term that is used to describe a very old concept </li></ul></ul><ul><ul><li>We primarily need to think about the different functions it enables, e.g. discovery and access management, the management of resources, long-term preservation, etc. </li></ul></ul>

5.
Definitions (3) <ul><li>Metadata for research data: </li></ul><ul><ul><li>Metadata are fundamentally important to the continued understanding and exploitation of research data </li></ul></ul><ul><ul><ul><li>It is “impossible to conduct a correct analysis of a data set without knowing how the data was cleaned, calibrated, what parameters were used in the process” (Deelman, et al 2004) </li></ul></ul></ul><ul><ul><ul><li>In some cases, extremely detailed documentation will be required </li></ul></ul></ul><ul><ul><ul><li>Captured from various stages of lifecycle </li></ul></ul></ul>

8.
The OAIS Information Model (3) <ul><li>Representation Information: </li></ul><ul><ul><li>Is tightly bound with the Data Object </li></ul></ul><ul><ul><li>Provides a bridge between the bit-level information being stored in an OAIS and something that can be understood </li></ul></ul><ul><ul><li>Describing data structure concepts, or formats (Structure Information) </li></ul></ul><ul><ul><li>Providing additional information on semantics (Semantic Information) </li></ul></ul>

9.
The OAIS Information Model (4) <ul><li>Preservation Description Information: </li></ul><ul><ul><li>The additional information “needed to make the Content Information meaningful for the indefinite long-term” (p. 4-33) </li></ul></ul><ul><ul><ul><li>For example, the information “needed to preserve the Content Information, to ensure that it is clearly identified, and to understand the environment in which the Content Information was created” (p. 2-6) </li></ul></ul></ul><ul><ul><ul><li>Reference, Context, Provenance, Fixity </li></ul></ul></ul>

11.
The OAIS Information Model (6) <ul><li>Lessons from OAIS (2): </li></ul><ul><ul><li>It highlights the importance of preserving context and provenance (but these are quite vaguely defined) </li></ul></ul><ul><ul><li>OAIS works on an abstract level, but there is a need to think about what needs to be done in practical terms to develop preservation metadata schemata ... </li></ul></ul>

12.
PREMIS Data Dictionary (1) <ul><li>Background (1): </li></ul><ul><ul><li>PREMIS Working Group (2003-2005) </li></ul></ul><ul><ul><li>An attempt to develop something that would be implementable </li></ul></ul><ul><ul><li>Development informed by OAIS model </li></ul></ul><ul><ul><li>Built upon on several initiatives that had been developing preservation metadata schemas and frameworks prior to 2003 </li></ul></ul><ul><ul><li>Data Dictionary first published in May 2005; v. 2.0 in March 2008 </li></ul></ul>

17.
PREMIS Data Dictionary (6) <ul><li>PREMIS usage (1): </li></ul><ul><ul><li>Survey undertaken for PREMIS Maintenance Activity (2007) </li></ul></ul><ul><ul><ul><li>16 repositories and projects surveyed (mostly dealing with documents rather than data) </li></ul></ul></ul><ul><ul><ul><li>Survey noted much diversity in the way PREMIS had been implemented </li></ul></ul></ul><ul><ul><ul><li>Tools were being used to capture technical metadata automatically </li></ul></ul></ul><ul><ul><ul><li>Formats could be identified using tools like JHOVE and PRONOM DROID </li></ul></ul></ul>

18.
PREMIS Data Dictionary (7) <ul><li>PREMIS usage (2): </li></ul><ul><ul><li>No major eScience input into PREMIS </li></ul></ul><ul><ul><li>PREMIS is occasionally used to help inform the preservation of research data: </li></ul></ul><ul><ul><ul><li>The National Snow and Ice Data Centre has used PREMIS as a way of evaluating its own OAIS-inspired metadata schema </li></ul></ul></ul><ul><ul><ul><li>The Stanford Digital Repository has experimented with the using PREMIS for geospatial resources </li></ul></ul></ul><ul><ul><ul><li>Experiments with the Yale Social Science Data Archive </li></ul></ul></ul>

19.
PREMIS Data Dictionary (8) <ul><li>Lessons from PREMIS: </li></ul><ul><ul><li>The Data Model demonstrates the importance of recording the contexts of preservation (events, agents), not just metadata on the objects </li></ul></ul><ul><ul><li>Currently little used in the e-research domain, but it has some potential where structured metadata already exists in some form (e.g., CSDGM, DDI) </li></ul></ul>

20.
Implications for e-research (1) <ul><li>The role of standards </li></ul><ul><ul><li>The development of standards (e.g. PREMIS) assumes that there is some level of commonality between domains </li></ul></ul><ul><ul><li>However, generic solutions are not really feasible for e-research data because of the diversity and complexity of: </li></ul></ul><ul><ul><ul><li>Research data (content) </li></ul></ul></ul><ul><ul><ul><li>Research contexts </li></ul></ul></ul><ul><ul><ul><li>Stakeholders </li></ul></ul></ul>

23.
Diversity and complexity (3) <ul><li>There is an even wider range of social contexts in which data is used (and shared) </li></ul><ul><ul><li>DCC SCARP project has been exploring disciplinary factors in curation practice </li></ul></ul><ul><ul><ul><li>Practice even within single disciplines is very fragmented </li></ul></ul></ul><ul><ul><ul><li>Case studies ongoing </li></ul></ul></ul><ul><ul><ul><ul><li>Big-science archives, medical and social sciences, architecutre and engineering, biological images </li></ul></ul></ul></ul>

26.
Implications for e-research (2) <ul><li>Metadata for digital curation or for long-term preservation? </li></ul><ul><ul><li>The concept of digital curation focuses on reuse and adding value - long-term preservation is not always the aim </li></ul></ul><ul><ul><li>PREMIS metadata is focused on particular things (viability, renderablility, understandability, authenticity and integrity) </li></ul></ul><ul><ul><li>What metadata do we need for digital curation? Could this ever be generic? </li></ul></ul>

27.
Implications for e-research (3) <ul><li>Metadata can be difficult to identify </li></ul><ul><ul><li>Difficult sometimes to work out where data ends and metadata begins </li></ul></ul><ul><ul><li>Depends on the point of view of the researcher </li></ul></ul>

28.
Implications for e-research (4) <ul><li>Lifecycle view </li></ul><ul><ul><li>Metadata has to be captured at multiple places in the scientiic workflow </li></ul></ul><ul><ul><li>Need to capture: </li></ul></ul><ul><ul><ul><li>Processes (can be driven by instrumentation) </li></ul></ul></ul><ul><ul><ul><li>Provenance </li></ul></ul></ul><ul><ul><ul><li>Context </li></ul></ul></ul>

29.
Implications for e-research (5) <ul><li>Big science, little science: </li></ul><ul><ul><li>Big science is by its nature data driven, and will often develop appropriate frameworks for its management and reuse (data centres, data grids) </li></ul></ul><ul><ul><li>Other scientific domains (e.g, ecology, biodiversity, chemistry) are moving in the same direction, but data retain a high-level of diversity and complexity </li></ul></ul>

30.
Summing-up <ul><li>The OAIS Information Model provides an abstract framework for thinking about preservation metadata </li></ul><ul><li>PREMIS provides an implementation framework that is beginning to be adoped in some domains </li></ul><ul><li>There are still many unresolved questions when it comes to defining metadata for research data </li></ul>

31.
Acknowledgements The Digital Curation Centre is funded by the JISC and the UK Research Councils' e-Science Core Programme. http://www.dcc.ac.uk/ UKOLN is funded by the Museums, Libraries and Archives Council, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC, the European Union, and other sources. UKOLN also receives support from the University of Bath, where it is based. http://www.ukoln.ac.uk/